postgres=#\h SELECT█
Poster 1 / 6
The SELECT Statement, Dissected
The complete query grammar in EBNF, ordered the way you write it. Keywords in
amber, grammar metasymbols in teal,
PostgreSQL-only extensions flagged in rust. Notation:
[ ] optional · { } repeat 0+ · | choice · ( ) group.
nonterminal
SQL keyword (terminal)
EBNF metasymbol
PostgreSQL extension
comment
Top-level query grammar
-- A full query = optional CTEs, a set-algebra body, then presentation clauses
select_query = [ with_clause ] query_body
[ ORDER BY sort_item { "," sort_item } ]
[ limit_clause ] [ locking_clause ] ";" ;
query_body = query_term { ( UNION | EXCEPT ) [ ALL | DISTINCT ] query_term } ;
query_term = query_prim { INTERSECT [ ALL | DISTINCT ] query_prim } ; -- INTERSECT binds tighter
query_prim = select_block | TABLE table_name | VALUES row_list | "(" query_body ")" ;
The SELECT block — every clause, in written order
select_block = SELECT [ ALL | DISTINCT [ ON "(" expr {"," expr} ")" ] ] -- DISTINCT ON = PG ext
select_list
[ FROM from_item { "," from_item } ] -- PG: FROM is OPTIONAL (SELECT 1+1)
[ WHERE condition ]
[ GROUP BY [ ALL | DISTINCT ] group_item { "," group_item } ]
[ HAVING condition ]
[ WINDOW win_name AS "(" window_def ")" { "," … } ] ;
select_list = "*" | select_item { "," select_item } ;
select_item = ( expr [ [ AS ] column_alias ] ) | ( table_name ".*" ) ;
group_item = expr | "()" | ROLLUP paren_list | CUBE paren_list | GROUPING SETS "(" … ")" ;
sort_item = expr [ ASC | DESC | USING operator ] [ NULLS ( FIRST | LAST ) ] ;
Pagination & locking
-- Two equivalent dialects. LIMIT/OFFSET = PG & MySQL; OFFSET…FETCH = SQL:2008 standard.
limit_clause = ( LIMIT ( count | ALL ) [ OFFSET start ] )
| ( OFFSET start [ ROW | ROWS ]
[ FETCH ( FIRST | NEXT ) [ count ] ( ROW | ROWS ) ( ONLY | WITH TIES ) ] ) ;
locking_clause = FOR ( UPDATE | NO KEY UPDATE | SHARE | KEY SHARE )
[ OF table {"," table} ] [ NOWAIT | SKIP LOCKED ] ;
Standard vs PostgreSQL — three gotchas.
① LIMIT/OFFSET are not in the SQL standard (PG & MySQL only); the portable form is OFFSET … FETCH.
② DISTINCT ON (…) is a PostgreSQL-only superpower — keep the first row per group.
③ PostgreSQL lets you omit FROM entirely (SELECT now();), which the standard forbids.
postgres=#EXPLAIN SELECT …█
Poster 2 / 6
The Logical Data-Flow: Tables In, Table Out
SQL is written in one order but executed in another. Each station takes a table (a multiset of
rows) and emits a transformed table. This is why an alias from SELECT works in ORDER BY
but not in WHERE — the WHERE station runs long before SELECT exists.
1
FROM / JOINrelation
Assemble the source. Multiple tables are cross-joined (Cartesian product), then JOIN conditions applied. → working table T₁
2
WHEREselection σ
Keep rows where the predicate is TRUE (not FALSE, not NULL). No aggregates here — groups don't exist yet. σ → T₂
3
GROUP BYgrouping γ
Partition T₂ into groups by equal key values; collapse each group to one row. Cardinality drops. γ → T₃
4
HAVINGselection σ
Filter the group rows — here aggregates like COUNT(*) > 5 are legal because groups now exist. σ → T₄
5
SELECTprojection π
Evaluate output expressions & window functions; assign column aliases. Now the result columns exist. π → T₅
6
DISTINCTdedup δ
Remove duplicate result rows (DISTINCT ON keeps first per key). δ → T₆
7
ORDER BYsort τ
Impose row order. Aliases from step 5 are visible here — the only place ordering is guaranteed. τ → T₇
8
LIMIT / OFFSETslice
Skip OFFSET rows, return at most LIMIT. Final result table delivered to the client. → RESULT
Alias rule, explained by the pipeline. Because WHERE (step 2) runs before
SELECT (step 5), a column alias defined in SELECT is invisible to WHERE and HAVING — you must
repeat the full expression. But ORDER BY (step 7) runs after SELECT, so it can use the alias.
Worked example — trace the table at each station
Query: SELECT dept, SUM(sales) AS total FROM s WHERE region='East' GROUP BY dept HAVING SUM(sales)>150 ORDER BY total DESC;
After FROM + WHERE (T₂)
| dept | region | sales |
| A | East | 100 |
| A | East | 80 |
| B | East | 120 |
| C | West | 90 |
Row C/West fails the predicate — dropped before grouping.
After GROUP BY + HAVING + ORDER (RESULT)
A=180 and B=120 are the two groups; HAVING>150 keeps only A. One group row per dept — the rows collapsed.
Relational-algebra equivalent
-- The whole query as a composed expression (read inside-out):
τtotal↓ ( σSUM(sales)>150 ( deptγSUM(sales)→total ( σregion='East' ( s ) ) ) )
sort HAVING filter GROUP BY + aggregate WHERE filter source
Each Greek letter is one pipeline station: σ select/filter · π project · γ group-aggregate · τ sort · δ distinct · ⋈ join.
postgres=#-- collapse vs keep█
Poster 3 / 6
GROUP BY vs Window Functions
Same arithmetic, opposite shapes. GROUP BY folds many rows into one summary row.
A window function keeps every row and attaches a value computed over a "window" of related rows.
GROUP BY — collapses (N rows → G groups)
SELECT dept, SUM(sales) AS total
FROM s GROUP BY dept;
Individual employees are gone. You cannot also select employee — no single value per group.
WINDOW — preserves (N rows → N rows)
SELECT employee, dept, sales,
SUM(sales) OVER (PARTITION BY dept) AS dt
FROM s;
| emp | dept | sales | dt |
| Asha | East | 100 | 400 |
| Bina | East | 300 | 400 |
| Deepa | West | 600 | 800 |
Every row survives and "knows" its group total. Detail and aggregate side by side.
Window function grammar (the OVER clause)
window_call = func "(" [ args ] ")" [ FILTER "(" WHERE cond ")" ] OVER ( win_name | "(" window_def ")" ) ;
window_def = [ existing_name ]
[ PARTITION BY expr {"," expr} ] -- like GROUP BY, but rows are NOT collapsed
[ ORDER BY sort_item {"," sort_item} ] -- defines order for ranking & running totals
[ frame ] ;
frame = ( RANGE | ROWS | GROUPS )
( frame_start | BETWEEN frame_start AND frame_end )
[ EXCLUDE ( CURRENT ROW | GROUP | TIES | NO OTHERS ) ] ;
frame_start = UNBOUNDED PRECEDING | n PRECEDING | CURRENT ROW | n FOLLOWING | UNBOUNDED FOLLOWING ;
What only window functions can do
| Function | Returns | Needs ORDER BY? |
ROW_NUMBER() | Unique sequential rank per partition | Yes |
RANK() / DENSE_RANK() | Rank with / without gaps after ties | Yes |
LAG(x,n) / LEAD(x,n) | Value from a row n positions back / ahead | Yes |
SUM(x) OVER (ORDER BY …) | Running / cumulative total | Yes (defines the running order) |
NTILE(k) | Bucket rows into k roughly equal tiles | Yes |
FIRST_VALUE/LAST_VALUE/NTH_VALUE | Value at frame edge / nth position | Usually |
Decision rule. Want one row per group (a report)? → GROUP BY.
Want every original row plus a value computed from its peers (rank, running total, share-of-total,
diff-from-average)? → window function with OVER (…). Both can coexist:
windowing runs after grouping (pipeline step 5), so a window function may aggregate over already-grouped rows.
postgres=#\d joins█
Poster 4 / 6
JOINs & Set Operations
How two tables combine — horizontally by matching columns (joins) or vertically by stacking rows (set ops).
Join grammar (the FROM item)
from_item = source { join } ;
source = [ ONLY ] table_name [ "*" ] [ [ AS ] alias ] -- ONLY = exclude inheritance children
| [ LATERAL ] "(" select_query ")" [ AS ] alias -- derived table / subquery
| [ LATERAL ] func_call [ WITH ORDINALITY ] [ AS ] alias -- table function
| VALUES row_list [ AS ] alias ;
join = ( [ INNER ] | ( LEFT | RIGHT | FULL ) [ OUTER ] ) JOIN source join_cond
| CROSS JOIN source
| NATURAL [ … ] JOIN source ;
join_cond = ON condition | USING "(" column {"," column} ")" [ AS alias ] ;
Join types at a glance
| Type | Keeps | Result |
INNER JOIN | Only matching pairs | Rows where the condition holds in both tables |
LEFT [OUTER] JOIN | All left + matches | Unmatched left rows padded with NULL on the right |
RIGHT [OUTER] JOIN | All right + matches | Mirror of LEFT |
FULL [OUTER] JOIN | All rows, both sides | Unmatched rows on either side padded with NULL |
CROSS JOIN | Every combination | Cartesian product (rowsL × rowsR); no condition |
LATERAL | Correlated subquery | Right side may reference columns of left side, evaluated per row |
ON vs USING vs NATURAL
ON a.id=b.id — explicit, keeps both columns.
USING (id) — equal-named columns, merges them into one.
NATURAL JOIN — auto-matches all same-named columns. Convenient but fragile: a new shared column silently changes results.
Set operations
q1 UNION [ALL] q2 -- combine; ALL keeps dups
q1 INTERSECT q2 -- rows in both
q1 EXCEPT q2 -- in q1, not q2
Both sides must have the same column count & compatible types. Default removes duplicates; add ALL to keep them. Precedence: INTERSECT > UNION/EXCEPT.
NULL ≠ NULL in joins. A join condition a.x = b.x never matches when either side is NULL
(the comparison yields UNKNOWN). Use IS NOT DISTINCT FROM if you need NULLs to match each other.
postgres=#\h -- DML · DDL · TCL█
Poster 5 / 6
Beyond SELECT: Writing & Defining Data
The other statement families, each in EBNF. DML changes rows, DDL changes structure,
TCL controls transactions, DCL controls access.
DML — Data Manipulation
insert = INSERT INTO table [ "(" column {"," column} ")" ]
( VALUES row {"," row} | select_query | DEFAULT VALUES )
[ ON CONFLICT [ target ] ( DO NOTHING | DO UPDATE SET … ) ] -- UPSERT (PG ext)
[ RETURNING ( "*" | expr_list ) ] ; -- RETURNING = PG ext
update = UPDATE [ ONLY ] table [ AS alias ] SET col "=" expr {"," …}
[ FROM from_item ] [ WHERE condition ] [ RETURNING … ] ;
delete = DELETE FROM [ ONLY ] table [ USING from_item ] [ WHERE condition ] [ RETURNING … ] ;
merge = MERGE INTO target USING source ON cond
{ WHEN [ NOT ] MATCHED [ AND cond ] THEN ( INSERT … | UPDATE … | DELETE | DO NOTHING ) } ;
DDL — Data Definition
create_table = CREATE [ TEMP | UNLOGGED ] TABLE [ IF NOT EXISTS ] name
"(" column_def {"," column_def | table_constraint} ")"
[ PARTITION BY ( RANGE | LIST | HASH ) "(" … ")" ] ;
column_def = name data_type { column_constraint } ;
column_constraint = PRIMARY KEY | UNIQUE | NOT NULL | NULL
| DEFAULT expr | CHECK "(" cond ")"
| GENERATED ( ALWAYS | BY DEFAULT ) AS ( IDENTITY | "("expr")" STORED )
| REFERENCES table [ "("col")" ] [ ON DELETE action ] ;
alter_table = ALTER TABLE [ IF EXISTS ] name action {"," action} ;
action = ADD COLUMN … | DROP COLUMN … | ALTER COLUMN … | ADD CONSTRAINT … | RENAME … ;
drop = DROP ( TABLE | VIEW | INDEX | … ) [ IF EXISTS ] name [ CASCADE | RESTRICT ] ;
TCL — Transaction Control
BEGIN [ISOLATION LEVEL lvl];
SAVEPOINT name;
RELEASE SAVEPOINT name;
ROLLBACK [TO SAVEPOINT name];
COMMIT;
Isolation levels: READ COMMITTED (default) · REPEATABLE READ · SERIALIZABLE.
DCL — Access Control
GRANT priv ON obj TO role
[WITH GRANT OPTION];
REVOKE priv ON obj FROM role;
priv = SELECT/INSERT/UPDATE/DELETE/ALL. Manage roles with CREATE ROLE.
WITH (Common Table Expressions) attach to any of the above:
WITH [RECURSIVE] name AS [[NOT] MATERIALIZED] (query) …. RECURSIVE enables
tree/graph traversal; PG can also run data-modifying CTEs (WITH x AS (DELETE … RETURNING …)).
postgres=#SHOW server_version; -- 18.4█
Poster 6 / 6
The PostgreSQL Edge & Clause Cheat-Grid
What makes Postgres Postgres — the features the SQL standard doesn't give you — plus a one-glance
reference for clause order and where each name is visible.
Master clause-order grid
| # | Clause | Job | Aliases visible? | Aggregates allowed? |
| 1 | FROM / JOIN | Assemble & combine sources | — | No |
| 2 | WHERE | Filter rows | No (use full expr) | No |
| 3 | GROUP BY | Collapse into groups | Output name OR input expr | No (defines them) |
| 4 | HAVING | Filter groups | No | Yes |
| 5 | SELECT + window | Project, compute, window | Defines them | Yes |
| 6 | DISTINCT | Dedup results | Yes | — |
| 7 | ORDER BY | Sort | Yes (prefers output name) | Yes |
| 8 | LIMIT / OFFSET | Slice | — | — |
PostgreSQL extensions worth knowing
Query & write
EXT DISTINCT ON (k) — first row per key.
EXT RETURNING * on INSERT/UPDATE/DELETE.
EXT ON CONFLICT … DO UPDATE — true UPSERT.
EXT Data-modifying CTEs.
EXT LIMIT/OFFSET (standard uses FETCH).
STD MERGE, GROUPING SETS, LATERAL.
Types & indexing
JSONB with ->, ->>, @>, #> operators.
Arrays, hstore, ranges, UUID, tsvector full-text.
Index types: B-tree, GIN (jsonb/arrays), GiST, BRIN, Hash, SP-GiST.
Partial & expression indexes: CREATE INDEX … WHERE active.
Generated columns, table partitioning, table inheritance.
Predicate & NULL logic — the quiet bugs
| Write this | Not this | Why |
x IS NULL | x = NULL | = NULL is always UNKNOWN → never true |
x IS NOT DISTINCT FROM y | x = y (with NULLs) | NULL-safe equality; treats NULL=NULL as true |
COUNT(*) | COUNT(col) for totals | COUNT(col) skips NULLs in that column |
WHERE x IN (…) | x IN (…, NULL) for NOT IN | NOT IN with a NULL in the list returns no rows |
COALESCE(x, 0) | arithmetic on raw NULL | Any operation with NULL yields NULL |
Reading the plan — your debugging lens
EXPLAIN [ ANALYZE ] [ BUFFERS ] [ FORMAT ( TEXT | JSON | YAML ) ] statement ;
-- ANALYZE actually runs the query & shows real row counts + timing.
-- Watch for: Seq Scan on big tables, rows estimate vs actual gap, nested-loop blowups.
Version note. Grammar verified against PostgreSQL 18 (current stable: 18.4, May 2026);
PostgreSQL 19 is in beta with GA expected Sept 2026. All EBNF here is forward-compatible — 19 adds features,
it doesn't remove these. Always run the latest minor release of your major version.