postgres=# PostgreSQL Master Reference · v18 · 9 posters contents 1234 56789 ⎙ PDF
postgres=#\i master_reference.sql

PostgreSQL Master Reference

SQL grammar in EBNF, the logical data-flow that explains every rule, and the practical fluency to write correct queries — a progressive learning arc, not a feature dump.

Verified against PostgreSQL 18 (stable 18.4) · notation: [ ] optional · { } repeat · | choice

postgres=#\h SELECT01 / 09

The SELECT Statement — Grammar & Execution Flow

SQL is written in one order and executed in another. The grammar says what's legal; the pipeline says why — why an alias works in ORDER BY but not WHERE, why aggregates are banned in WHERE but allowed in HAVING.

nonterminal keyword metasymbol PostgreSQL extension

Complete SELECT grammar (written order)

select_query = [ with_clause ] query_body [ ORDER BY sort_item {"," sort_item} ] [ limit_clause ] [ lock ] ";" ; query_body = query_term { ( UNION | EXCEPT ) [ ALL ] query_term } ; query_term = query_prim { INTERSECT [ ALL ] query_prim } ; query_prim = select_block | VALUES rows | TABLE name | "(" query_body ")" ; select_block = SELECT [ ALL | DISTINCT [ ON "(" expr {"," expr} ")" ] ] select_list [ FROM from_item {"," from_item} ] -- PG: FROM optional → SELECT now(); [ WHERE condition ] [ GROUP BY group_item {"," group_item} ] [ HAVING condition ] [ WINDOW name AS "(" window_def ")" ] ; limit_clause = LIMIT (n|ALL) [ OFFSET m ] | OFFSET m [ FETCH (FIRST|NEXT) n ROWS (ONLY|WITH TIES) ] ;

The execution pipeline — tables in, table out

1
FROM / JOINrelation ⋈
Assemble sources; multiple tables cross-join, then JOIN conditions apply. → T₁
2
WHEREselect σ
Keep rows where predicate is TRUE (not FALSE/NULL). No aggregates — groups don't exist yet. → T₂
3
GROUP BYgroup γ
Partition by key; collapse each group to one row. Cardinality drops. → T₃
4
HAVINGselect σ
Filter group rows; aggregates legal here because groups now exist. → T₄
5
SELECTproject π
Evaluate output expressions & window functions; assign aliases — result columns now exist. → T₅
6
DISTINCTdedup δ
Remove duplicate result rows. → T₆
7
ORDER BYsort τ
Impose order; aliases from step 5 visible here — the only place order is guaranteed. → T₇
8
LIMIT / OFFSETslice
Skip OFFSET, return at most LIMIT. Delivered to client. → RESULT
The alias rule, explained by the pipeline. WHERE (2) runs before SELECT (5), so a SELECT alias is invisible to WHERE/HAVING — repeat the full expression. ORDER BY (7) runs after SELECT, so it can use the alias.

Relational-algebra equivalent

SELECT dept, SUM(sales) AS total FROM s WHERE region='East' GROUP BY dept HAVING SUM(sales)>150 ORDER BY total DESC; τtotal↓ ( πdept,total ( σSUM>150 ( deptγSUM(sales)→total ( σregion='East' ( s ) ) ) ) ) sort project HAVING GROUP+aggregate WHERE source
Three PostgreSQL divergences.LIMIT/OFFSET aren't standard (portable: OFFSET … FETCH). ② DISTINCT ON (k) keeps the first row per key — PG-only. ③ FROM may be omitted entirely.
Execution order is logical; the planner may reorder physically but results match this model.postgresql.org/docs/18/sql-select.html
postgres=#\dT -- data types02 / 09

Data Types — the Vocabulary

Every column, expression and comparison has a type. Choosing well prevents silent bugs (float money, naive timestamps) and unlocks the right operators and indexes.

Numeric

TypeUse forNote
smallint/int/bigintWhole numbers2 / 4 / 8 bytes
numeric(p,s)Money, exactArbitrary precision, no rounding error
real / double precisionScientificFloat — never for currency
serial / bigserialAuto-incrementLegacy; prefer GENERATED … IDENTITY

Text

TypeUse forNote
textDefault choiceUnlimited; no perf penalty in PG
varchar(n)Length-cappedEnforces a max; otherwise = text
char(n)Fixed widthSpace-padded; rarely wanted
text and varchar perform identically in PG — reach for text unless you need a hard length limit.

Temporal

TypeStores
dateCalendar day, no time
time [tz]Time of day
timestampDate+time, no zone (naive)
timestamptzZone-aware — use this
intervalDuration ('2 days 3 hours')
Default to timestamptz. It stores UTC and converts on display; timestamp drops zone context and causes off-by-hours bugs.

Boolean, UUID & semi-structured

TypeValues / use
booleanTRUE / FALSE / NULL
uuid128-bit id; gen_random_uuid()
byteaRaw binary
enumCREATE TYPE mood AS ENUM(…)
jsonbIndexed JSON; -> ->> @>
type[]Array: int[], text[]
tsvector / int4rangeFull-text / range types

Casting & literals

SELECT '2026-06-29'::date, -- PG cast shorthand :: CAST('42' AS int), -- SQL-standard cast 3.0::numeric / 2, -- 1.5 (int/int truncates to 1) '{1,2,3}'::int[], -- array literal '{"a":1}'::jsonb ->> 'a'; -- → '1' (text)
Integer division truncates. 5/2 = 2. Force real division by casting one side: 5::numeric/2 = 2.5.
Type drives operators, indexes & comparison semantics — choose deliberately.postgresql.org/docs/18/datatype.html
postgres=#-- WHERE · NULL · CASE03 / 09

Filtering, NULL Logic & CASE

WHERE keeps a row only when its condition is TRUE. NULL turns Boolean logic from two values into three — the source of more silent bugs than any other SQL feature.

Predicate toolkit

PredicateMatchesExample
= <> < > <= >=Comparisonprice >= 100
BETWEEN a AND bInclusive rangeage BETWEEN 18 AND 65
IN (…)Any of a setstate IN ('AP','TS')
LIKE / ILIKEPattern (% _); ILIKE case-insensitivename ILIKE 'kal%'
~ / ~*POSIX regex / case-insensitivecode ~ '^[A-Z]{3}$'
IS [NOT] NULLNull testdeleted_at IS NULL
IS [NOT] DISTINCT FROMNULL-safe equalitya IS NOT DISTINCT FROM b

Three-valued logic

Any comparison with NULL yields UNKNOWN. A row passes WHERE only on TRUE.

ANDTFNULL
TTFNULL
FFFF
NULLNULLFNULL
ORTFNULL
TTTT
FTFNULL
NULLTNULLNULL

NULL traps & fixes

TrapFix
x = NULL never truex IS NULL
NOT IN (…,NULL) → 0 rowsFilter NULLs / NOT EXISTS
COUNT(col) skips NULLsCOUNT(*) for total rows
a = b false when both NULLIS NOT DISTINCT FROM
'x'||NULL → NULLCOALESCE(col,'')
UNIQUE allows many NULLs. NULL ≠ NULL, so duplicate NULLs pass a unique constraint unless you add NULLS NOT DISTINCT (PG 15+).

Conditional expressions

-- CASE: the if/else of SQL SELECT name, CASE WHEN score >= 90 THEN 'A' WHEN score >= 75 THEN 'B' ELSE 'C' END AS grade FROM students; -- NULL helpers COALESCE(a, b, 0) -- first non-NULL argument NULLIF(x, 0) -- NULL if x=0 (guard division) GREATEST(a,b,c) / LEAST(a,b,c) -- ignore NULLs
Conditional aggregation — pivot without a pivot: SUM(CASE WHEN region='East' THEN sales ELSE 0 END). Cleaner: SUM(sales) FILTER (WHERE region='East').
WHERE filters rows (pre-group); HAVING filters groups (post-group).postgresql.org/docs/18/functions-conditional.html
postgres=#\df -- functions04 / 09

Functions & Operators

The working toolkit, grouped by what you're transforming. Scalar functions act per row; aggregates collapse many rows into one.

String

FunctionResult
a || bConcatenate
length(s)Character count
lower / upper / initcapCase transforms
trim / ltrim / rtrimStrip whitespace
substring(s,from,len)Extract slice
replace(s,old,new)Substitute
split_part(s,delim,n)Nth field
position(sub in s)Index of substring
format('%s=%s',k,v)printf-style build
string_agg(x,',')aggregate: join rows

Date / time

FunctionResult
now() / current_dateCurrent moment / day
age(ts)Interval from now
date_trunc('month',ts)Round down to unit
extract(year from ts)Pull a field
ts + interval '1 day'Date arithmetic
to_char(ts,'YYYY-MM')Format to text
to_date(s,'DD/MM/YYYY')Parse from text
generate_series(a,b,step)Row-set of values

Math & conversion

FunctionResult
round(x,n) / truncRound / truncate
ceil / floor / absStandard math
mod(a,b) / a % bRemainder
power(a,b) / sqrtExponent / root
random()0 ≤ x < 1
x::type / cast(x as t)Convert type

Aggregates

FunctionResult
count(*) / count(col)All rows / non-NULL
sum / avg / min / maxStandard rollups
array_agg(x)Collect into array
string_agg(x,sep)Collect into string
jsonb_agg(x)Collect into JSON
bool_and / bool_orLogical rollup
percentile_cont(0.5) WITHIN GROUP (ORDER BY x)Median
FILTER beats CASE inside aggregates. Instead of SUM(CASE WHEN paid THEN amt END), write SUM(amt) FILTER (WHERE paid) — clearer and NULL-clean. Any aggregate accepts FILTER (WHERE …).
DISTINCT inside an aggregate. COUNT(DISTINCT user_id) counts unique values; plain COUNT(user_id) counts non-NULL occurrences — they differ whenever duplicates exist.
Scalar = per row · Aggregate = many → one · both composable with window OVER.postgresql.org/docs/18/functions.html
postgres=#\d -- combine relations05 / 09

JOINs, Set Ops & Subqueries

Three ways to combine relations: JOINs widen rows (more columns), set operations stack rows (same columns), subqueries nest one query inside another.

Join grammar

from_item = source { join } ; join = ( [INNER] | (LEFT|RIGHT|FULL) [OUTER] ) JOIN source ( ON cond | USING "("col…")" ) | CROSS JOIN source | NATURAL JOIN source | [LATERAL] ;
JoinKeepsUnmatched rows
INNERMatching pairs onlyDropped both sides
LEFT OUTERAll left + matchesLeft kept, right = NULL
RIGHT OUTERAll right + matchesRight kept, left = NULL
FULL OUTEREverythingEither side padded NULL
CROSSEvery combinationCartesian product
LATERALPer-row correlated subqueryRight may reference left's columns

ON vs USING vs NATURAL

ON a.id=b.id — explicit, both columns survive.
USING (id) — equal-named cols, merged to one.
NATURAL JOIN — auto-matches all shared names; convenient but fragile — a new shared column silently changes results.

Set operations

q1 UNION [ALL] q2 -- combine q1 INTERSECT q2 -- in both q1 EXCEPT q2 -- in q1 not q2

Same column count & compatible types. Default dedups; ALL keeps dups. INTERSECT binds tighter than UNION/EXCEPT.

Subqueries — four shapes

-- 1. Scalar: returns one value, usable anywhere a value fits SELECT name, salary, (SELECT avg(salary) FROM emp) AS company_avg FROM emp; -- 2. IN / ANY / ALL: membership against a column of values SELECT * FROM emp WHERE dept_id IN (SELECT id FROM dept WHERE region='East'); -- 3. EXISTS: correlated true/false test — stops at first match, NULL-safe SELECT * FROM dept d WHERE EXISTS (SELECT 1 FROM emp e WHERE e.dept_id = d.id); -- 4. Derived table: a subquery in FROM, aliased like a table SELECT region, max_sal FROM (SELECT region, max(salary) max_sal FROM emp GROUP BY region) t;
EXISTS over IN for NULL safety. NOT IN (subquery) returns zero rows if the subquery yields a single NULL. NOT EXISTS has no such trap and usually plans better for anti-joins.
JOIN = wider rows · SET = stacked rows · SUBQUERY = nested logic. Correlated subqueries re-run per outer row.postgresql.org/docs/18/queries-table-expressions.html
postgres=#-- collapse vs keep06 / 09

GROUP BY vs Window Functions

Same arithmetic, opposite shapes. GROUP BY folds many rows into one summary. A window function keeps every row and attaches a value computed over a window of related rows.

GROUP BY — collapses (N → G)

SELECT dept, SUM(sales) AS total FROM s GROUP BY dept;
depttotal
East400
West800

Employees gone — you can't also select employee; no single value per group.

WINDOW — preserves (N → N)

SELECT employee, dept, sales, SUM(sales) OVER (PARTITION BY dept) AS dt FROM s;
empdeptsalesdt
AshaEast100400
BinaEast300400
DeepaWest600800

Every row survives and knows its group total. Detail and aggregate together.

The OVER clause grammar

window_call = func "("[args]")" [ FILTER "("WHERE cond")" ] OVER ( window_def ) ; window_def = [ PARTITION BY expr… ] -- like GROUP BY, rows NOT collapsed [ ORDER BY sort… ] -- order for ranking & running totals [ frame ] ; frame = (ROWS|RANGE|GROUPS) BETWEEN start AND end ; start/end = UNBOUNDED PRECEDING | n PRECEDING | CURRENT ROW | n FOLLOWING | UNBOUNDED FOLLOWING ;

Window function catalog

FunctionReturnsNeeds ORDER BY
ROW_NUMBER()Unique sequential indexYes
RANK() / DENSE_RANK()Rank with / without gaps after tiesYes
LAG(x,n) / LEAD(x,n)Value n rows back / aheadYes
SUM(x) OVER(ORDER BY …)Running / cumulative totalYes
NTILE(k)Split into k bucketsYes
FIRST_VALUE / LAST_VALUE / NTH_VALUEValue at frame edge / positionUsually
Decision rule. One row per group (a report)? → GROUP BY. Every row plus a peer-computed value (rank, running total, share, diff-from-avg)? → window OVER (…). Windowing runs after grouping, so the two can coexist.
Math: GROUP BY = γ in bag algebra (many→one). Window = per-row map over an ordered context (one→one). Neither is in pure relational calculus.postgresql.org/docs/18/tutorial-window.html
postgres=#WITH RECURSIVE …07 / 09

CTEs & Recursion

A Common Table Expression names a subquery up front, so complex logic reads top-to-bottom instead of inside-out. RECURSIVE extends this to hierarchies and graphs — the standard way to walk a tree.

Grammar

with_clause = WITH [ RECURSIVE ] cte {"," cte} ; cte = name [ "("col…")" ] AS [ [NOT] MATERIALIZED ] "(" query ")" ;

Plain CTE — readability

WITH regional AS ( SELECT region, SUM(sales) total FROM s GROUP BY region ) SELECT * FROM regional WHERE total > 500 ORDER BY total DESC;

Name the intermediate result once, reuse it. Chains of CTEs read as sequential steps.

MATERIALIZED control

PG may inline a CTE into the outer query for speed. Force "compute once" with AS MATERIALIZED, or allow inlining with NOT MATERIALIZED. Use MATERIALIZED when the CTE is expensive and referenced multiple times.

Recursive CTE — walking a hierarchy

Two parts joined by UNION ALL: a base case (anchor) and a recursive case that references the CTE itself. PG iterates until the recursive part returns no new rows.

-- employees(id, name, manager_id) → everyone under Asha, with depth WITH RECURSIVE chain AS ( -- base case: the starting node SELECT id, name, manager_id, 1 AS depth FROM employees WHERE name = 'Asha' UNION ALL -- recursive case: children of rows already found SELECT e.id, e.name, e.manager_id, c.depth + 1 FROM employees e JOIN chain c ON e.manager_id = c.id ) SELECT repeat(' ', depth-1) || name AS tree, depth FROM chain ORDER BY depth;

Input: employees

idnamemgr
1Asha
2Bina1
3Chandu1
4Deepa2

Output: chain

treedepth
Asha1
  Bina2
  Chandu2
    Deepa3
Guard against infinite loops. On cyclic graphs an unguarded recursive CTE never terminates. PG 14+ offers UNION … CYCLE col SET is_cycle USING path; otherwise track a visited path array and filter.
Data-modifying CTEs (PG extension). WITH moved AS (DELETE FROM old WHERE … RETURNING *) INSERT INTO archive SELECT * FROM moved; — delete and re-insert atomically in one statement.
CTEs = top-down readability · RECURSIVE = anchor + UNION ALL self-reference until fixpoint.postgresql.org/docs/18/queries-with.html
postgres=#\h -- DML·DDL·TCL·DCL08 / 09

Beyond SELECT: Write, Define, Control

DML changes rows · DDL changes structure · TCL controls transactions · DCL controls access.

DML — manipulate data

insert = INSERT INTO table ["("col…")"] ( VALUES row… | select_query ) [ ON CONFLICT [target] ( DO NOTHING | DO UPDATE SET col"="EXCLUDED.col ) ] -- UPSERT [ RETURNING ("*"|expr…) ] ; -- RETURNING = PG ext update = UPDATE table SET col"="expr… [ FROM from_item ] [ WHERE cond ] [ RETURNING ] ; delete = DELETE FROM table [ USING from_item ] [ WHERE cond ] [ RETURNING ] ; merge = MERGE INTO tgt USING src ON cond { WHEN [NOT] MATCHED THEN (INSERT|UPDATE|DELETE) } ;
WHERE-less UPDATE/DELETE hits every row. Wrap risky writes in BEGIN; … ROLLBACK; to preview the row count before committing.

DDL — define structure

create_table = CREATE [TEMP|UNLOGGED] TABLE [IF NOT EXISTS] name "(" col type {constraint} {"," } ")" [ PARTITION BY (RANGE|LIST|HASH) ] ; constraint = PRIMARY KEY | UNIQUE [NULLS NOT DISTINCT] | NOT NULL | DEFAULT expr | CHECK "("cond")" | GENERATED ALWAYS AS IDENTITY | REFERENCES tbl [ON DELETE action] ; alter = ALTER TABLE name ( ADD|DROP|ALTER COLUMN | ADD CONSTRAINT | RENAME ) ; drop = DROP (TABLE|VIEW|INDEX) [IF EXISTS] name [CASCADE|RESTRICT] ;

TCL — transactions

BEGIN [ISOLATION LEVEL lvl]; SAVEPOINT sp; ROLLBACK TO sp; COMMIT; ROLLBACK;

Levels: READ COMMITTED (default) · REPEATABLE READ · SERIALIZABLE. ACID is per-transaction.

DCL — access

GRANT priv ON obj TO role; REVOKE priv ON obj FROM role;

priv = SELECT/INSERT/UPDATE/DELETE/ALL. Row-level security via CREATE POLICY.

RETURNING, ON CONFLICT, data-modifying CTEs & NULLS NOT DISTINCT are PostgreSQL extensions.postgresql.org/docs/18/sql-commands.html
postgres=#EXPLAIN ANALYZE …09 / 09

Indexes & EXPLAIN Performance

An index is a shortcut the planner may use. Knowing which index suits which query — and reading the plan to confirm it's used — is the difference between a 5ms and a 5s query.

Index types — which when

TypeBest forExample
B-tree (default)Equality & range on scalars; sortingWHERE price > 100 ORDER BY price
HashEquality onlyWHERE id = 42
GINMulti-value: jsonb, arrays, full-textWHERE tags @> '{sql}'
GiSTGeometric, ranges, nearest-neighbourWHERE geom && box
BRINHuge, naturally-ordered tables (time-series)WHERE created_at > …
SP-GiSTNon-balanced: quadtrees, IP rangesWHERE inet && '10.0.0.0/8'

Targeted index recipes

-- composite: order matters (leftmost prefix) CREATE INDEX ON orders (cust_id, created_at); -- partial: index only rows you query CREATE INDEX ON users (email) WHERE active; -- expression: match the WHERE expression CREATE INDEX ON users (lower(email)); -- covering: serve query from index alone CREATE INDEX ON orders (cust_id) INCLUDE (total);

Why an index gets ignored

Function on the column: WHERE lower(email)=… skips a plain email index — index the expression instead.
Leading wildcard: LIKE '%x' can't use B-tree.
Type mismatch forces a cast.
Tiny table: a seq scan is genuinely faster.
Stale stats: run ANALYZE.

Reading EXPLAIN

EXPLAIN [ ANALYZE ] [ BUFFERS ] [ FORMAT (TEXT|JSON) ] statement ; -- ANALYZE actually runs it, showing real rows + timing (use a transaction for writes). Index Scan using orders_cust_idx on orders (cost=0.42..8.4 rows=1 width=36) (actual time=0.018..0.020 rows=1 loops=1) ← actual ≈ estimate = healthy
Plan nodeMeaningConcern
Seq ScanRead whole tableBad on large tables w/ selective filter
Index Scan / Index Only ScanUse index; "Only" = no heap fetchGood
Bitmap Heap ScanMany matching rows via indexFine for medium selectivity
Nested LoopProbe inner per outer rowBlows up if outer rows large
Hash Join / Merge JoinBuild hash / merge sorted inputsGood for large joins
The estimate-vs-actual gap is the tell. When the planner expected rows=1 but got rows=50000, its choices rested on wrong stats — run ANALYZE, then match the index to the actual WHERE/JOIN/ORDER BY columns.
Version note. Verified on PostgreSQL 18 (stable 18.4, May 2026); v19 GA expected Sept 2026 — additive only. Always run the latest minor of your major version.
Index = optional shortcut · EXPLAIN ANALYZE = ground truth · match index to real query shape.postgresql.org/docs/18/using-explain.html