postgres=#\h SELECT█01 / 09

The SELECT Statement — Grammar & Execution Flow

SQL is written in one order and executed in another. The grammar says what's legal; the pipeline says why — why an alias works in ORDER BY but not WHERE, why aggregates are banned in WHERE but allowed in HAVING.

nonterminal keyword metasymbol PostgreSQL extension

Complete SELECT grammar (written order)

select_query = [ with_clause ] query_body [ ORDER BY sort_item {"," sort_item} ] [ limit_clause ] [ lock ] ";" ; query_body = query_term { ( UNION | EXCEPT ) [ ALL ] query_term } ; query_term = query_prim { INTERSECT [ ALL ] query_prim } ; query_prim = select_block | VALUES rows | TABLE name | "(" query_body ")" ; select_block = SELECT [ ALL | DISTINCT [ ON "(" expr {"," expr} ")" ] ] select_list [ FROM from_item {"," from_item} ] -- PG: FROM optional → SELECT now(); [ WHERE condition ] [ GROUP BY group_item {"," group_item} ] [ HAVING condition ] [ WINDOW name AS "(" window_def ")" ] ; limit_clause = LIMIT (n|ALL) [ OFFSET m ] | OFFSET m [ FETCH (FIRST|NEXT) n ROWS (ONLY|WITH TIES) ] ;

The execution pipeline — tables in, table out

1

FROM / JOINrelation ⋈

Assemble sources; multiple tables cross-join, then JOIN conditions apply. → T₁

2

WHEREselect σ

Keep rows where predicate is TRUE (not FALSE/NULL). No aggregates — groups don't exist yet. → T₂

3

GROUP BYgroup γ

Partition by key; collapse each group to one row. Cardinality drops. → T₃

4

HAVINGselect σ

Filter group rows; aggregates legal here because groups now exist. → T₄

5

SELECTproject π

Evaluate output expressions & window functions; assign aliases — result columns now exist. → T₅

6

DISTINCTdedup δ

Remove duplicate result rows. → T₆

7

ORDER BYsort τ

Impose order; aliases from step 5 visible here — the only place order is guaranteed. → T₇

8

LIMIT / OFFSETslice

Skip OFFSET, return at most LIMIT. Delivered to client. → RESULT

The alias rule, explained by the pipeline. WHERE (2) runs before SELECT (5), so a SELECT alias is invisible to WHERE/HAVING — repeat the full expression. ORDER BY (7) runs after SELECT, so it can use the alias.

Relational-algebra equivalent

SELECT dept, SUM(sales) AS total FROM s WHERE region='East' GROUP BY dept HAVING SUM(sales)>150 ORDER BY total DESC; τ_total↓ ( π_dept,total ( σ_SUM>150 ( _deptγ_{SUM(sales)→total} ( σ_{region='East'} ( s ) ) ) ) ) sort project HAVING GROUP+aggregate WHERE source

Three PostgreSQL divergences. ① LIMIT/OFFSET aren't standard (portable: OFFSET … FETCH). ② DISTINCT ON (k) keeps the first row per key — PG-only. ③ FROM may be omitted entirely.

Execution order is logical; the planner may reorder physically but results match this model.postgresql.org/docs/18/sql-select.html

postgres=#\dT -- data types█02 / 09

Data Types — the Vocabulary

Every column, expression and comparison has a type. Choosing well prevents silent bugs (float money, naive timestamps) and unlocks the right operators and indexes.

Numeric

Type	Use for	Note
`smallint/int/bigint`	Whole numbers	2 / 4 / 8 bytes
`numeric(p,s)`	Money, exact	Arbitrary precision, no rounding error
`real / double precision`	Scientific	Float — never for currency
`serial / bigserial`	Auto-increment	Legacy; prefer `GENERATED … IDENTITY`

Text

Type	Use for	Note
`text`	Default choice	Unlimited; no perf penalty in PG
`varchar(n)`	Length-capped	Enforces a max; otherwise = text
`char(n)`	Fixed width	Space-padded; rarely wanted

text and varchar perform identically in PG — reach for text unless you need a hard length limit.

Temporal

Type	Stores
`date`	Calendar day, no time
`time [tz]`	Time of day
`timestamp`	Date+time, no zone (naive)
`timestamptz`	Zone-aware — use this
`interval`	Duration (`'2 days 3 hours'`)

Default to timestamptz. It stores UTC and converts on display; timestamp drops zone context and causes off-by-hours bugs.

Boolean, UUID & semi-structured

Type	Values / use
`boolean`	`TRUE` / `FALSE` / `NULL`
`uuid`	128-bit id; `gen_random_uuid()`
`bytea`	Raw binary
`enum`	`CREATE TYPE mood AS ENUM(…)`
`jsonb`	Indexed JSON; `->` `->>` `@>`
`type[]`	Array: `int[]`, `text[]`
`tsvector` / `int4range`	Full-text / range types

Casting & literals

SELECT '2026-06-29'::date, -- PG cast shorthand :: CAST('42' AS int), -- SQL-standard cast 3.0::numeric / 2, -- 1.5 (int/int truncates to 1) '{1,2,3}'::int[], -- array literal '{"a":1}'::jsonb ->> 'a'; -- → '1' (text)

Integer division truncates. 5/2 = 2. Force real division by casting one side: 5::numeric/2 = 2.5.

Type drives operators, indexes & comparison semantics — choose deliberately.postgresql.org/docs/18/datatype.html

postgres=#-- WHERE · NULL · CASE█03 / 09

Filtering, NULL Logic & CASE

WHERE keeps a row only when its condition is TRUE. NULL turns Boolean logic from two values into three — the source of more silent bugs than any other SQL feature.

Predicate toolkit

Predicate	Matches	Example
`= <> < > <= >=`	Comparison	`price >= 100`
`BETWEEN a AND b`	Inclusive range	`age BETWEEN 18 AND 65`
`IN (…)`	Any of a set	`state IN ('AP','TS')`
`LIKE` / `ILIKE`	Pattern (`%` `_`); ILIKE case-insensitive	`name ILIKE 'kal%'`
`~` / `~*`	POSIX regex / case-insensitive	`code ~ '^[A-Z]{3}$'`
`IS [NOT] NULL`	Null test	`deleted_at IS NULL`
`IS [NOT] DISTINCT FROM`	NULL-safe equality	`a IS NOT DISTINCT FROM b`

Three-valued logic

Any comparison with NULL yields UNKNOWN. A row passes WHERE only on TRUE.

AND	T	F	NULL
T	T	F	NULL
F	F	F	F
NULL	NULL	F	NULL

OR	T	F	NULL
T	T	T	T
F	T	F	NULL
NULL	T	NULL	NULL

NULL traps & fixes

Trap	Fix
`x = NULL` never true	`x IS NULL`
`NOT IN (…,NULL)` → 0 rows	Filter NULLs / `NOT EXISTS`
`COUNT(col)` skips NULLs	`COUNT(*)` for total rows
`a = b` false when both NULL	`IS NOT DISTINCT FROM`
`'x'\|\|NULL` → NULL	`COALESCE(col,'')`

UNIQUE allows many NULLs. NULL ≠ NULL, so duplicate NULLs pass a unique constraint unless you add NULLS NOT DISTINCT (PG 15+).

Conditional expressions

-- CASE: the if/else of SQL SELECT name, CASE WHEN score >= 90 THEN 'A' WHEN score >= 75 THEN 'B' ELSE 'C' END AS grade FROM students; -- NULL helpers COALESCE(a, b, 0) -- first non-NULL argument NULLIF(x, 0) -- NULL if x=0 (guard division) GREATEST(a,b,c) / LEAST(a,b,c) -- ignore NULLs

Conditional aggregation — pivot without a pivot: SUM(CASE WHEN region='East' THEN sales ELSE 0 END). Cleaner: SUM(sales) FILTER (WHERE region='East').

WHERE filters rows (pre-group); HAVING filters groups (post-group).postgresql.org/docs/18/functions-conditional.html

postgres=#\df -- functions█04 / 09

Functions & Operators

The working toolkit, grouped by what you're transforming. Scalar functions act per row; aggregates collapse many rows into one.

String

Function	Result
`a \|\| b`	Concatenate
`length(s)`	Character count
`lower / upper / initcap`	Case transforms
`trim / ltrim / rtrim`	Strip whitespace
`substring(s,from,len)`	Extract slice
`replace(s,old,new)`	Substitute
`split_part(s,delim,n)`	Nth field
`position(sub in s)`	Index of substring
`format('%s=%s',k,v)`	printf-style build
`string_agg(x,',')`	aggregate: join rows

Date / time

Function	Result
`now() / current_date`	Current moment / day
`age(ts)`	Interval from now
`date_trunc('month',ts)`	Round down to unit
`extract(year from ts)`	Pull a field
`ts + interval '1 day'`	Date arithmetic
`to_char(ts,'YYYY-MM')`	Format to text
`to_date(s,'DD/MM/YYYY')`	Parse from text
`generate_series(a,b,step)`	Row-set of values

Math & conversion

Function	Result
`round(x,n) / trunc`	Round / truncate
`ceil / floor / abs`	Standard math
`mod(a,b) / a % b`	Remainder
`power(a,b) / sqrt`	Exponent / root
`random()`	0 ≤ x < 1
`x::type / cast(x as t)`	Convert type

Aggregates

Function	Result
`count(*) / count(col)`	All rows / non-NULL
`sum / avg / min / max`	Standard rollups
`array_agg(x)`	Collect into array
`string_agg(x,sep)`	Collect into string
`jsonb_agg(x)`	Collect into JSON
`bool_and / bool_or`	Logical rollup
`percentile_cont(0.5) WITHIN GROUP (ORDER BY x)`	Median

FILTER beats CASE inside aggregates. Instead of SUM(CASE WHEN paid THEN amt END), write SUM(amt) FILTER (WHERE paid) — clearer and NULL-clean. Any aggregate accepts FILTER (WHERE …).

DISTINCT inside an aggregate. COUNT(DISTINCT user_id) counts unique values; plain COUNT(user_id) counts non-NULL occurrences — they differ whenever duplicates exist.

Scalar = per row · Aggregate = many → one · both composable with window OVER.postgresql.org/docs/18/functions.html

postgres=#\d -- combine relations█05 / 09

JOINs, Set Ops & Subqueries

Three ways to combine relations: JOINs widen rows (more columns), set operations stack rows (same columns), subqueries nest one query inside another.

Join grammar

from_item = source { join } ; join = ( [INNER] | (LEFT|RIGHT|FULL) [OUTER] ) JOIN source ( ON cond | USING "("col…")" ) | CROSS JOIN source | NATURAL … JOIN source | [LATERAL] … ;

Join	Keeps	Unmatched rows
`INNER`	Matching pairs only	Dropped both sides
`LEFT OUTER`	All left + matches	Left kept, right = NULL
`RIGHT OUTER`	All right + matches	Right kept, left = NULL
`FULL OUTER`	Everything	Either side padded NULL
`CROSS`	Every combination	Cartesian product
`LATERAL`	Per-row correlated subquery	Right may reference left's columns

ON vs USING vs NATURAL

ON a.id=b.id — explicit, both columns survive.
USING (id) — equal-named cols, merged to one.
NATURAL JOIN — auto-matches all shared names; convenient but fragile — a new shared column silently changes results.

Set operations

q1 UNION [ALL] q2 -- combine q1 INTERSECT q2 -- in both q1 EXCEPT q2 -- in q1 not q2

Same column count & compatible types. Default dedups; ALL keeps dups. INTERSECT binds tighter than UNION/EXCEPT.

Subqueries — four shapes

-- 1. Scalar: returns one value, usable anywhere a value fits SELECT name, salary, (SELECT avg(salary) FROM emp) AS company_avg FROM emp; -- 2. IN / ANY / ALL: membership against a column of values SELECT * FROM emp WHERE dept_id IN (SELECT id FROM dept WHERE region='East'); -- 3. EXISTS: correlated true/false test — stops at first match, NULL-safe SELECT * FROM dept d WHERE EXISTS (SELECT 1 FROM emp e WHERE e.dept_id = d.id); -- 4. Derived table: a subquery in FROM, aliased like a table SELECT region, max_sal FROM (SELECT region, max(salary) max_sal FROM emp GROUP BY region) t;

EXISTS over IN for NULL safety. NOT IN (subquery) returns zero rows if the subquery yields a single NULL. NOT EXISTS has no such trap and usually plans better for anti-joins.

JOIN = wider rows · SET = stacked rows · SUBQUERY = nested logic. Correlated subqueries re-run per outer row.postgresql.org/docs/18/queries-table-expressions.html

postgres=#-- collapse vs keep█06 / 09

GROUP BY vs Window Functions

Same arithmetic, opposite shapes. GROUP BY folds many rows into one summary. A window function keeps every row and attaches a value computed over a window of related rows.

GROUP BY — collapses (N → G)

SELECT dept, SUM(sales) AS total FROM s GROUP BY dept;

dept	total
East	400
West	800

Employees gone — you can't also select employee; no single value per group.

WINDOW — preserves (N → N)

SELECT employee, dept, sales, SUM(sales) OVER (PARTITION BY dept) AS dt FROM s;

emp	dept	sales	dt
Asha	East	100	400
Bina	East	300	400
Deepa	West	600	800

Every row survives and knows its group total. Detail and aggregate together.

The OVER clause grammar

window_call = func "("[args]")" [ FILTER "("WHERE cond")" ] OVER ( window_def ) ; window_def = [ PARTITION BY expr… ] -- like GROUP BY, rows NOT collapsed [ ORDER BY sort… ] -- order for ranking & running totals [ frame ] ; frame = (ROWS|RANGE|GROUPS) BETWEEN start AND end ; start/end = UNBOUNDED PRECEDING | n PRECEDING | CURRENT ROW | n FOLLOWING | UNBOUNDED FOLLOWING ;

Window function catalog

Function	Returns	Needs ORDER BY
`ROW_NUMBER()`	Unique sequential index	Yes
`RANK() / DENSE_RANK()`	Rank with / without gaps after ties	Yes
`LAG(x,n) / LEAD(x,n)`	Value n rows back / ahead	Yes
`SUM(x) OVER(ORDER BY …)`	Running / cumulative total	Yes
`NTILE(k)`	Split into k buckets	Yes
`FIRST_VALUE / LAST_VALUE / NTH_VALUE`	Value at frame edge / position	Usually

Decision rule. One row per group (a report)? → GROUP BY. Every row plus a peer-computed value (rank, running total, share, diff-from-avg)? → window OVER (…). Windowing runs after grouping, so the two can coexist.

Math: GROUP BY = γ in bag algebra (many→one). Window = per-row map over an ordered context (one→one). Neither is in pure relational calculus.postgresql.org/docs/18/tutorial-window.html

postgres=#WITH RECURSIVE …█07 / 09

CTEs & Recursion

A Common Table Expression names a subquery up front, so complex logic reads top-to-bottom instead of inside-out. RECURSIVE extends this to hierarchies and graphs — the standard way to walk a tree.

Grammar

with_clause = WITH [ RECURSIVE ] cte {"," cte} ; cte = name [ "("col…")" ] AS [ [NOT] MATERIALIZED ] "(" query ")" ;

Plain CTE — readability

WITH regional AS ( SELECT region, SUM(sales) total FROM s GROUP BY region ) SELECT * FROM regional WHERE total > 500 ORDER BY total DESC;

Name the intermediate result once, reuse it. Chains of CTEs read as sequential steps.

MATERIALIZED control

PG may inline a CTE into the outer query for speed. Force "compute once" with AS MATERIALIZED, or allow inlining with NOT MATERIALIZED. Use MATERIALIZED when the CTE is expensive and referenced multiple times.

Recursive CTE — walking a hierarchy

Two parts joined by UNION ALL: a base case (anchor) and a recursive case that references the CTE itself. PG iterates until the recursive part returns no new rows.

-- employees(id, name, manager_id) → everyone under Asha, with depth WITH RECURSIVE chain AS ( -- base case: the starting node SELECT id, name, manager_id, 1 AS depth FROM employees WHERE name = 'Asha' UNION ALL -- recursive case: children of rows already found SELECT e.id, e.name, e.manager_id, c.depth + 1 FROM employees e JOIN chain c ON e.manager_id = c.id ) SELECT repeat(' ', depth-1) || name AS tree, depth FROM chain ORDER BY depth;

Input: employees

id	name	mgr
1	Asha	—
2	Bina	1
3	Chandu	1
4	Deepa	2

Output: chain

tree	depth
Asha	1
Bina	2
Chandu	2
Deepa	3

Guard against infinite loops. On cyclic graphs an unguarded recursive CTE never terminates. PG 14+ offers UNION … CYCLE col SET is_cycle USING path; otherwise track a visited path array and filter.

Data-modifying CTEs (PG extension). WITH moved AS (DELETE FROM old WHERE … RETURNING *) INSERT INTO archive SELECT * FROM moved; — delete and re-insert atomically in one statement.

CTEs = top-down readability · RECURSIVE = anchor + UNION ALL self-reference until fixpoint.postgresql.org/docs/18/queries-with.html

postgres=#\h -- DML·DDL·TCL·DCL█08 / 09

Beyond SELECT: Write, Define, Control

DML changes rows · DDL changes structure · TCL controls transactions · DCL controls access.

DML — manipulate data

insert = INSERT INTO table ["("col…")"] ( VALUES row… | select_query ) [ ON CONFLICT [target] ( DO NOTHING | DO UPDATE SET col"="EXCLUDED.col ) ] -- UPSERT [ RETURNING ("*"|expr…) ] ; -- RETURNING = PG ext update = UPDATE table SET col"="expr… [ FROM from_item ] [ WHERE cond ] [ RETURNING … ] ; delete = DELETE FROM table [ USING from_item ] [ WHERE cond ] [ RETURNING … ] ; merge = MERGE INTO tgt USING src ON cond { WHEN [NOT] MATCHED THEN (INSERT|UPDATE|DELETE) } ;

WHERE-less UPDATE/DELETE hits every row. Wrap risky writes in BEGIN; … ROLLBACK; to preview the row count before committing.

DDL — define structure

create_table = CREATE [TEMP|UNLOGGED] TABLE [IF NOT EXISTS] name "(" col type {constraint} {"," …} ")" [ PARTITION BY (RANGE|LIST|HASH) … ] ; constraint = PRIMARY KEY | UNIQUE [NULLS NOT DISTINCT] | NOT NULL | DEFAULT expr | CHECK "("cond")" | GENERATED ALWAYS AS IDENTITY | REFERENCES tbl [ON DELETE action] ; alter = ALTER TABLE name ( ADD|DROP|ALTER COLUMN … | ADD CONSTRAINT … | RENAME … ) ; drop = DROP (TABLE|VIEW|INDEX) [IF EXISTS] name [CASCADE|RESTRICT] ;

TCL — transactions

BEGIN [ISOLATION LEVEL lvl]; SAVEPOINT sp; ROLLBACK TO sp; COMMIT; ROLLBACK;

Levels: READ COMMITTED (default) · REPEATABLE READ · SERIALIZABLE. ACID is per-transaction.

DCL — access

GRANT priv ON obj TO role; REVOKE priv ON obj FROM role;

priv = SELECT/INSERT/UPDATE/DELETE/ALL. Row-level security via CREATE POLICY.

RETURNING, ON CONFLICT, data-modifying CTEs & NULLS NOT DISTINCT are PostgreSQL extensions.postgresql.org/docs/18/sql-commands.html

postgres=#EXPLAIN ANALYZE …█09 / 09

Indexes & EXPLAIN Performance

An index is a shortcut the planner may use. Knowing which index suits which query — and reading the plan to confirm it's used — is the difference between a 5ms and a 5s query.

Index types — which when

Type	Best for	Example
B-tree (default)	Equality & range on scalars; sorting	`WHERE price > 100 ORDER BY price`
Hash	Equality only	`WHERE id = 42`
GIN	Multi-value: jsonb, arrays, full-text	`WHERE tags @> '{sql}'`
GiST	Geometric, ranges, nearest-neighbour	`WHERE geom && box`
BRIN	Huge, naturally-ordered tables (time-series)	`WHERE created_at > …`
SP-GiST	Non-balanced: quadtrees, IP ranges	`WHERE inet && '10.0.0.0/8'`

Targeted index recipes

-- composite: order matters (leftmost prefix) CREATE INDEX ON orders (cust_id, created_at); -- partial: index only rows you query CREATE INDEX ON users (email) WHERE active; -- expression: match the WHERE expression CREATE INDEX ON users (lower(email)); -- covering: serve query from index alone CREATE INDEX ON orders (cust_id) INCLUDE (total);

Why an index gets ignored

• Function on the column: WHERE lower(email)=… skips a plain email index — index the expression instead.
• Leading wildcard: LIKE '%x' can't use B-tree.
• Type mismatch forces a cast.
• Tiny table: a seq scan is genuinely faster.
• Stale stats: run ANALYZE.

Reading EXPLAIN

EXPLAIN [ ANALYZE ] [ BUFFERS ] [ FORMAT (TEXT|JSON) ] statement ; -- ANALYZE actually runs it, showing real rows + timing (use a transaction for writes). Index Scan using orders_cust_idx on orders (cost=0.42..8.4 rows=1 width=36) (actual time=0.018..0.020 rows=1 loops=1) ← actual ≈ estimate = healthy

Plan node	Meaning	Concern
`Seq Scan`	Read whole table	Bad on large tables w/ selective filter
`Index Scan / Index Only Scan`	Use index; "Only" = no heap fetch	Good
`Bitmap Heap Scan`	Many matching rows via index	Fine for medium selectivity
`Nested Loop`	Probe inner per outer row	Blows up if outer rows large
`Hash Join / Merge Join`	Build hash / merge sorted inputs	Good for large joins

The estimate-vs-actual gap is the tell. When the planner expected rows=1 but got rows=50000, its choices rested on wrong stats — run ANALYZE, then match the index to the actual WHERE/JOIN/ORDER BY columns.

Version note. Verified on PostgreSQL 18 (stable 18.4, May 2026); v19 GA expected Sept 2026 — additive only. Always run the latest minor of your major version.

Index = optional shortcut · EXPLAIN ANALYZE = ground truth · match index to real query shape.postgresql.org/docs/18/using-explain.html

PostgreSQL Master Reference

The SELECT Statement — Grammar & Execution Flow

Complete SELECT grammar (written order)

The execution pipeline — tables in, table out

Relational-algebra equivalent

Data Types — the Vocabulary

Numeric

Text

Temporal

Boolean, UUID & semi-structured

Casting & literals

Filtering, NULL Logic & CASE

Predicate toolkit

Three-valued logic

NULL traps & fixes

Conditional expressions

Functions & Operators

String

Date / time

Math & conversion

Aggregates

JOINs, Set Ops & Subqueries

Join grammar

ON vs USING vs NATURAL

Set operations

Subqueries — four shapes

GROUP BY vs Window Functions

GROUP BY — collapses (N → G)

WINDOW — preserves (N → N)

The OVER clause grammar

Window function catalog

CTEs & Recursion

Grammar

Plain CTE — readability

MATERIALIZED control

Recursive CTE — walking a hierarchy

Input: employees

Output: chain

Beyond SELECT: Write, Define, Control

DML — manipulate data

DDL — define structure

TCL — transactions

DCL — access

Indexes & EXPLAIN Performance

Index types — which when

Targeted index recipes

Why an index gets ignored

Reading EXPLAIN