postgres=#\h SELECT█01 / 09

The SELECT Statement — Grammar & Execution Flow

SQL is written in one order and executed in another. The grammar says what's legal; the pipeline says why — why an alias works in ORDER BY but not WHERE, why aggregates are banned in WHERE but allowed in HAVING.

nonterminal keyword metasymbol PostgreSQL extension

Complete SELECT grammar (written order)

select_query = [ with_clause ] query_body [ ORDER BY sort_item {"," sort_item} ] [ limit_clause ] [ lock ] ";" ; query_body = query_term { ( UNION | EXCEPT ) [ ALL ] query_term } ; query_term = query_prim { INTERSECT [ ALL ] query_prim } ; query_prim = select_block | VALUES rows | TABLE name | "(" query_body ")" ; select_block = SELECT [ ALL | DISTINCT [ ON "(" expr {"," expr} ")" ] ] select_list [ FROM from_item {"," from_item} ] -- PG: FROM optional → SELECT now(); [ WHERE condition ] [ GROUP BY group_item {"," group_item} ] [ HAVING condition ] [ WINDOW name AS "(" window_def ")" ] ; limit_clause = LIMIT (n|ALL) [ OFFSET m ] | OFFSET m [ FETCH (FIRST|NEXT) n ROWS (ONLY|WITH TIES) ] ;

The execution pipeline — tables in, table out

1

FROM / JOINrelation ⋈

Assemble sources; multiple tables cross-join, then JOIN conditions apply. → T₁

2

WHEREselect σ

Keep rows where predicate is TRUE (not FALSE/NULL). No aggregates — groups don't exist yet. → T₂

3

GROUP BYgroup γ

Partition by key; collapse each group to one row. Cardinality drops. → T₃

4

HAVINGselect σ

Filter group rows; aggregates legal here because groups now exist. → T₄

5

SELECTproject π

Evaluate output expressions & window functions; assign aliases — result columns now exist. → T₅

6

DISTINCTdedup δ

Remove duplicate result rows. → T₆

7

ORDER BYsort τ

Impose order; aliases from step 5 visible here — the only place order is guaranteed. → T₇

8

LIMIT / OFFSETslice

Skip OFFSET, return at most LIMIT. Delivered to client. → RESULT

The alias rule, explained by the pipeline. WHERE (2) runs before SELECT (5), so a SELECT alias is invisible to WHERE/HAVING — repeat the full expression. ORDER BY (7) runs after SELECT, so it can use the alias.

Relational-algebra equivalent

SELECT dept, SUM(sales) AS total FROM s WHERE region='East' GROUP BY dept HAVING SUM(sales)>150 ORDER BY total DESC; τ_total↓ ( π_dept,total ( σ_SUM>150 ( _deptγ_{SUM(sales)→total} ( σ_{region='East'} ( s ) ) ) ) ) sort project HAVING GROUP+aggregate WHERE source

Three PostgreSQL divergences. ① LIMIT/OFFSET aren't standard (portable: OFFSET … FETCH). ② DISTINCT ON (k) keeps the first row per key — PG-only. ③ FROM may be omitted entirely.

Execution order is logical; the planner may reorder physically but results match this model.postgresql.org/docs/18/sql-select.html

postgres=#\dT -- data types█02 / 09

Data Types — the Vocabulary

Every column, expression and comparison has a type. Choosing well prevents silent bugs (float money, naive timestamps) and unlocks the right operators and indexes.

Numeric

Type	Use for	Note
`smallint/int/bigint`	Whole numbers	2 / 4 / 8 bytes
`numeric(p,s)`	Money, exact	Arbitrary precision, no rounding error
`real / double precision`	Scientific	Float — never for currency
`serial / bigserial`	Auto-increment	Legacy; prefer `GENERATED … IDENTITY`

Text

Type	Use for	Note
`text`	Default choice	Unlimited; no perf penalty in PG
`varchar(n)`	Length-capped	Enforces a max; otherwise = text
`char(n)`	Fixed width	Space-padded; rarely wanted

text and varchar perform identically in PG — reach for text unless you need a hard length limit.

Temporal

Type	Stores
`date`	Calendar day, no time
`time [tz]`	Time of day
`timestamp`	Date+time, no zone (naive)
`timestamptz`	Zone-aware — use this
`interval`	Duration (`'2 days 3 hours'`)

Default to timestamptz. It stores UTC and converts on display; timestamp drops zone context and causes off-by-hours bugs.

Boolean, UUID & semi-structured

Type	Values / use
`boolean`	`TRUE` / `FALSE` / `NULL`
`uuid`	128-bit id; `gen_random_uuid()`
`bytea`	Raw binary
`enum`	`CREATE TYPE mood AS ENUM(…)`
`jsonb`	Indexed JSON; `->` `->>` `@>`
`type[]`	Array: `int[]`, `text[]`
`tsvector` / `int4range`	Full-text / range types

Casting & literals

SELECT '2026-06-29'::date, -- PG cast shorthand :: CAST('42' AS int), -- SQL-standard cast 3.0::numeric / 2, -- 1.5 (int/int truncates to 1) '{1,2,3}'::int[], -- array literal '{"a":1}'::jsonb ->> 'a'; -- → '1' (text)

Integer division truncates. 5/2 = 2. Force real division by casting one side: 5::numeric/2 = 2.5.

Type drives operators, indexes & comparison semantics — choose deliberately.postgresql.org/docs/18/datatype.html

postgres=#-- WHERE · NULL · CASE█03 / 09

Filtering, NULL Logic & CASE

WHERE keeps a row only when its condition is TRUE. NULL turns Boolean logic from two values into three — the source of more silent bugs than any other SQL feature.

Predicate toolkit

Predicate	Matches	Example
`= <> < > <= >=`	Comparison	`price >= 100`
`BETWEEN a AND b`	Inclusive range	`age BETWEEN 18 AND 65`
`IN (…)`	Any of a set	`state IN ('AP','TS')`
`LIKE` / `ILIKE`	Pattern (`%` `_`); ILIKE case-insensitive	`name ILIKE 'kal%'`
`~` / `~*`	POSIX regex / case-insensitive	`code ~ '^[A-Z]{3}$'`
`IS [NOT] NULL`	Null test	`deleted_at IS NULL`
`IS [NOT] DISTINCT FROM`	NULL-safe equality	`a IS NOT DISTINCT FROM b`

Three-valued logic

Any comparison with NULL yields UNKNOWN. A row passes WHERE only on TRUE.

AND	T	F	NULL
T	T	F	NULL
F	F	F	F
NULL	NULL	F	NULL

OR	T	F	NULL
T	T	T	T
F	T	F	NULL
NULL	T	NULL	NULL

NULL traps & fixes

Trap	Fix
`x = NULL` never true	`x IS NULL`
`NOT IN (…,NULL)` → 0 rows	Filter NULLs / `NOT EXISTS`
`COUNT(col)` skips NULLs	`COUNT(*)` for total rows
`a = b` false when both NULL	`IS NOT DISTINCT FROM`
`'x'\|\|NULL` → NULL	`COALESCE(col,'')`

UNIQUE allows many NULLs. NULL ≠ NULL, so duplicate NULLs pass a unique constraint unless you add NULLS NOT DISTINCT (PG 15+).

Conditional expressions

-- CASE: the if/else of SQL SELECT name, CASE WHEN score >= 90 THEN 'A' WHEN score >= 75 THEN 'B' ELSE 'C' END AS grade FROM students; -- NULL helpers COALESCE(a, b, 0) -- first non-NULL argument NULLIF(x, 0) -- NULL if x=0 (guard division) GREATEST(a,b,c) / LEAST(a,b,c) -- ignore NULLs

Conditional aggregation — pivot without a pivot: SUM(CASE WHEN region='East' THEN sales ELSE 0 END). Cleaner: SUM(sales) FILTER (WHERE region='East').

WHERE filters rows (pre-group); HAVING filters groups (post-group).postgresql.org/docs/18/functions-conditional.html

postgres=#\df -- functions█04 / 09

Functions & Operators

The working toolkit, grouped by what you're transforming. Scalar functions act per row; aggregates collapse many rows into one.

String

Function	Result
`a \|\| b`	Concatenate
`length(s)`	Character count
`lower / upper / initcap`	Case transforms
`trim / ltrim / rtrim`	Strip whitespace
`substring(s,from,len)`	Extract slice
`replace(s,old,new)`	Substitute
`split_part(s,delim,n)`	Nth field
`position(sub in s)`	Index of substring
`format('%s=%s',k,v)`	printf-style build
`string_agg(x,',')`	aggregate: join rows

Date / time

Function	Result
`now() / current_date`	Current moment / day
`age(ts)`	Interval from now
`date_trunc('month',ts)`	Round down to unit
`extract(year from ts)`	Pull a field
`ts + interval '1 day'`	Date arithmetic
`to_char(ts,'YYYY-MM')`	Format to text
`to_date(s,'DD/MM/YYYY')`	Parse from text
`generate_series(a,b,step)`	Row-set of values

Math & conversion

Function	Result
`round(x,n) / trunc`	Round / truncate
`ceil / floor / abs`	Standard math
`mod(a,b) / a % b`	Remainder
`power(a,b) / sqrt`	Exponent / root
`random()`	0 ≤ x < 1
`x::type / cast(x as t)`	Convert type

Aggregates

Function	Result
`count(*) / count(col)`	All rows / non-NULL
`sum / avg / min / max`	Standard rollups
`array_agg(x)`	Collect into array
`string_agg(x,sep)`	Collect into string
`jsonb_agg(x)`	Collect into JSON
`bool_and / bool_or`	Logical rollup
`percentile_cont(0.5) WITHIN GROUP (ORDER BY x)`	Median

FILTER beats CASE inside aggregates. Instead of SUM(CASE WHEN paid THEN amt END), write SUM(amt) FILTER (WHERE paid) — clearer and NULL-clean. Any aggregate accepts FILTER (WHERE …).

DISTINCT inside an aggregate. COUNT(DISTINCT user_id) counts unique values; plain COUNT(user_id) counts non-NULL occurrences — they differ whenever duplicates exist.

Scalar = per row · Aggregate = many → one · both composable with window OVER.postgresql.org/docs/18/functions.html

postgres=#\d -- combine relations█05 / 09

JOINs, Set Ops & Subqueries

Three ways to combine relations: JOINs widen rows (more columns), set operations stack rows (same columns), subqueries nest one query inside another.

Join grammar

from_item = source { join } ; join = ( [INNER] | (LEFT|RIGHT|FULL) [OUTER] ) JOIN source ( ON cond | USING "("col…")" ) | CROSS JOIN source | NATURAL … JOIN source | [LATERAL] … ;

Join	Keeps	Unmatched rows
`INNER`	Matching pairs only	Dropped both sides
`LEFT OUTER`	All left + matches	Left kept, right = NULL
`RIGHT OUTER`	All right + matches	Right kept, left = NULL
`FULL OUTER`	Everything	Either side padded NULL
`CROSS`	Every combination	Cartesian product
`LATERAL`	Per-row correlated subquery	Right may reference left's columns

ON vs USING vs NATURAL

ON a.id=b.id — explicit, both columns survive.
USING (id) — equal-named cols, merged to one.
NATURAL JOIN — auto-matches all shared names; convenient but fragile — a new shared column silently changes results.

Set operations

q1 UNION [ALL] q2 -- combine q1 INTERSECT q2 -- in both q1 EXCEPT q2 -- in q1 not q2

Same column count & compatible types. Default dedups; ALL keeps dups. INTERSECT binds tighter than UNION/EXCEPT.

Subqueries — four shapes

-- 1. Scalar: returns one value, usable anywhere a value fits SELECT name, salary, (SELECT avg(salary) FROM emp) AS company_avg FROM emp; -- 2. IN / ANY / ALL: membership against a column of values SELECT * FROM emp WHERE dept_id IN (SELECT id FROM dept WHERE region='East'); -- 3. EXISTS: correlated true/false test — stops at first match, NULL-safe SELECT * FROM dept d WHERE EXISTS (SELECT 1 FROM emp e WHERE e.dept_id = d.id); -- 4. Derived table: a subquery in FROM, aliased like a table SELECT region, max_sal FROM (SELECT region, max(salary) max_sal FROM emp GROUP BY region) t;

EXISTS over IN for NULL safety. NOT IN (subquery) returns zero rows if the subquery yields a single NULL. NOT EXISTS has no such trap and usually plans better for anti-joins.

JOIN = wider rows · SET = stacked rows · SUBQUERY = nested logic. Correlated subqueries re-run per outer row.postgresql.org/docs/18/queries-table-expressions.html

postgres=#-- collapse vs keep█06 / 09

GROUP BY vs Window Functions

Same arithmetic, opposite shapes. GROUP BY folds many rows into one summary. A window function keeps every row and attaches a value computed over a window of related rows.

GROUP BY — collapses (N → G)

SELECT dept, SUM(sales) AS total FROM s GROUP BY dept;

dept	total
East	400
West	800

Employees gone — you can't also select employee; no single value per group.

WINDOW — preserves (N → N)

SELECT employee, dept, sales, SUM(sales) OVER (PARTITION BY dept) AS dt FROM s;

emp	dept	sales	dt
Asha	East	100	400
Bina	East	300	400
Deepa	West	600	800

Every row survives and knows its group total. Detail and aggregate together.

The OVER clause grammar

window_call = func "("[args]")" [ FILTER "("WHERE cond")" ] OVER ( window_def ) ; window_def = [ PARTITION BY expr… ] -- like GROUP BY, rows NOT collapsed [ ORDER BY sort… ] -- order for ranking & running totals [ frame ] ; frame = (ROWS|RANGE|GROUPS) BETWEEN start AND end ; start/end = UNBOUNDED PRECEDING | n PRECEDING | CURRENT ROW | n FOLLOWING | UNBOUNDED FOLLOWING ;

Window function catalog

Function	Returns	Needs ORDER BY
`ROW_NUMBER()`	Unique sequential index	Yes
`RANK() / DENSE_RANK()`	Rank with / without gaps after ties	Yes
`LAG(x,n) / LEAD(x,n)`	Value n rows back / ahead	Yes
`SUM(x) OVER(ORDER BY …)`	Running / cumulative total	Yes
`NTILE(k)`	Split into k buckets	Yes
`FIRST_VALUE / LAST_VALUE / NTH_VALUE`	Value at frame edge / position	Usually

Decision rule. One row per group (a report)? → GROUP BY. Every row plus a peer-computed value (rank, running total, share, diff-from-avg)? → window OVER (…). Windowing runs after grouping, so the two can coexist.

Math: GROUP BY = γ in bag algebra (many→one). Window = per-row map over an ordered context (one→one). Neither is in pure relational calculus.postgresql.org/docs/18/tutorial-window.html

postgres=#WITH RECURSIVE …█07 / 09

CTEs & Recursion

A Common Table Expression names a subquery up front, so complex logic reads top-to-bottom instead of inside-out. RECURSIVE extends this to hierarchies and graphs — the standard way to walk a tree.

Grammar

with_clause = WITH [ RECURSIVE ] cte {"," cte} ; cte = name [ "("col…")" ] AS [ [NOT] MATERIALIZED ] "(" query ")" ;

Plain CTE — readability

WITH regional AS ( SELECT region, SUM(sales) total FROM s GROUP BY region ) SELECT * FROM regional WHERE total > 500 ORDER BY total DESC;

Name the intermediate result once, reuse it. Chains of CTEs read as sequential steps.

MATERIALIZED control

PG may inline a CTE into the outer query for speed. Force "compute once" with AS MATERIALIZED, or allow inlining with NOT MATERIALIZED. Use MATERIALIZED when the CTE is expensive and referenced multiple times.

Recursive CTE — walking a hierarchy

Two parts joined by UNION ALL: a base case (anchor) and a recursive case that references the CTE itself. PG iterates until the recursive part returns no new rows.

-- employees(id, name, manager_id) → everyone under Asha, with depth WITH RECURSIVE chain AS ( -- base case: the starting node SELECT id, name, manager_id, 1 AS depth FROM employees WHERE name = 'Asha' UNION ALL -- recursive case: children of rows already found SELECT e.id, e.name, e.manager_id, c.depth + 1 FROM employees e JOIN chain c ON e.manager_id = c.id ) SELECT repeat(' ', depth-1) || name AS tree, depth FROM chain ORDER BY depth;

Input: employees

id	name	mgr
1	Asha	—
2	Bina	1
3	Chandu	1
4	Deepa	2

Output: chain

tree	depth
Asha	1
Bina	2
Chandu	2
Deepa	3

Guard against infinite loops. On cyclic graphs an unguarded recursive CTE never terminates. PG 14+ offers UNION … CYCLE col SET is_cycle USING path; otherwise track a visited path array and filter.

Data-modifying CTEs (PG extension). WITH moved AS (DELETE FROM old WHERE … RETURNING *) INSERT INTO archive SELECT * FROM moved; — delete and re-insert atomically in one statement.

CTEs = top-down readability · RECURSIVE = anchor + UNION ALL self-reference until fixpoint.postgresql.org/docs/18/queries-with.html

postgres=#\h -- DML·DDL·TCL·DCL█08 / 09

Beyond SELECT: Write, Define, Control

DML changes rows · DDL changes structure · TCL controls transactions · DCL controls access.

DML — manipulate data

insert = INSERT INTO table ["("col…")"] ( VALUES row… | select_query ) [ ON CONFLICT [target] ( DO NOTHING | DO UPDATE SET col"="EXCLUDED.col ) ] -- UPSERT [ RETURNING ("*"|expr…) ] ; -- RETURNING = PG ext update = UPDATE table SET col"="expr… [ FROM from_item ] [ WHERE cond ] [ RETURNING … ] ; delete = DELETE FROM table [ USING from_item ] [ WHERE cond ] [ RETURNING … ] ; merge = MERGE INTO tgt USING src ON cond { WHEN [NOT] MATCHED THEN (INSERT|UPDATE|DELETE) } ;

WHERE-less UPDATE/DELETE hits every row. Wrap risky writes in BEGIN; … ROLLBACK; to preview the row count before committing.

DDL — define structure

create_table = CREATE [TEMP|UNLOGGED] TABLE [IF NOT EXISTS] name "(" col type {constraint} {"," …} ")" [ PARTITION BY (RANGE|LIST|HASH) … ] ; constraint = PRIMARY KEY | UNIQUE [NULLS NOT DISTINCT] | NOT NULL | DEFAULT expr | CHECK "("cond")" | GENERATED ALWAYS AS IDENTITY | REFERENCES tbl [ON DELETE action] ; alter = ALTER TABLE name ( ADD|DROP|ALTER COLUMN … | ADD CONSTRAINT … | RENAME … ) ; drop = DROP (TABLE|VIEW|INDEX) [IF EXISTS] name [CASCADE|RESTRICT] ;

TCL — transactions

BEGIN [ISOLATION LEVEL lvl]; SAVEPOINT sp; ROLLBACK TO sp; COMMIT; ROLLBACK;

Levels: READ COMMITTED (default) · REPEATABLE READ · SERIALIZABLE. ACID is per-transaction.

DCL — access

GRANT priv ON obj TO role; REVOKE priv ON obj FROM role;

priv = SELECT/INSERT/UPDATE/DELETE/ALL. Row-level security via CREATE POLICY.

RETURNING, ON CONFLICT, data-modifying CTEs & NULLS NOT DISTINCT are PostgreSQL extensions.postgresql.org/docs/18/sql-commands.html

postgres=#EXPLAIN ANALYZE …█09 / 09

Indexes & EXPLAIN Performance

An index is a shortcut the planner may use. Knowing which index suits which query — and reading the plan to confirm it's used — is the difference between a 5ms and a 5s query.

Index types — which when

Type	Best for	Example
B-tree (default)	Equality & range on scalars; sorting	`WHERE price > 100 ORDER BY price`
Hash	Equality only	`WHERE id = 42`
GIN	Multi-value: jsonb, arrays, full-text	`WHERE tags @> '{sql}'`
GiST	Geometric, ranges, nearest-neighbour	`WHERE geom && box`
BRIN	Huge, naturally-ordered tables (time-series)	`WHERE created_at > …`
SP-GiST	Non-balanced: quadtrees, IP ranges	`WHERE inet && '10.0.0.0/8'`

Targeted index recipes

-- composite: order matters (leftmost prefix) CREATE INDEX ON orders (cust_id, created_at); -- partial: index only rows you query CREATE INDEX ON users (email) WHERE active; -- expression: match the WHERE expression CREATE INDEX ON users (lower(email)); -- covering: serve query from index alone CREATE INDEX ON orders (cust_id) INCLUDE (total);

Why an index gets ignored

• Function on the column: WHERE lower(email)=… skips a plain email index — index the expression instead.
• Leading wildcard: LIKE '%x' can't use B-tree.
• Type mismatch forces a cast.
• Tiny table: a seq scan is genuinely faster.
• Stale stats: run ANALYZE.

Reading EXPLAIN

EXPLAIN [ ANALYZE ] [ BUFFERS ] [ FORMAT (TEXT|JSON) ] statement ; -- ANALYZE actually runs it, showing real rows + timing (use a transaction for writes). Index Scan using orders_cust_idx on orders (cost=0.42..8.4 rows=1 width=36) (actual time=0.018..0.020 rows=1 loops=1) ← actual ≈ estimate = healthy

Plan node	Meaning	Concern
`Seq Scan`	Read whole table	Bad on large tables w/ selective filter
`Index Scan / Index Only Scan`	Use index; "Only" = no heap fetch	Good
`Bitmap Heap Scan`	Many matching rows via index	Fine for medium selectivity
`Nested Loop`	Probe inner per outer row	Blows up if outer rows large
`Hash Join / Merge Join`	Build hash / merge sorted inputs	Good for large joins

The estimate-vs-actual gap is the tell. When the planner expected rows=1 but got rows=50000, its choices rested on wrong stats — run ANALYZE, then match the index to the actual WHERE/JOIN/ORDER BY columns.

Version note. Verified on PostgreSQL 18 (stable 18.4, May 2026); v19 GA expected Sept 2026 — additive only. Always run the latest minor of your major version.

Index = optional shortcut · EXPLAIN ANALYZE = ground truth · match index to real query shape.postgresql.org/docs/18/using-explain.html

postgis=#CREATE EXTENSION postgis;█10 / 12 · POSTGIS

PostGIS — Spatial Types & SRID

PostGIS turns PostgreSQL into a spatial database: new column types, ~300 ST_ functions, and spatial indexes. Everything starts with two decisions — which geometry type, and geometry vs geography.

spatial type function / keyword operator critical gotcha

Enable & verify

CREATE EXTENSION IF NOT EXISTS postgis; -- core: types, functions, GiST support CREATE EXTENSION postgis_raster; -- optional: raster / coverage CREATE EXTENSION postgis_topology; -- optional: topological model SELECT postgis_full_version(); -- → POSTGIS 3.6.2, GEOS 3.14, PROJ 8.2 …

The geometry type hierarchy

Type	Is	WKT example
`POINT`	One location	`POINT(77.6 14.4)`
`LINESTRING`	Ordered vertices	`LINESTRING(0 0, 1 1, 2 1)`
`POLYGON`	Ring(s); first = outer	`POLYGON((0 0,4 0,4 4,0 4,0 0))`
`MULTIPOINT`	Many points	`MULTIPOINT((0 0),(1 1))`
`MULTILINESTRING`	Many lines	…
`MULTIPOLYGON`	Many polygons (islands)	…
`GEOMETRYCOLLECTION`	Mixed bag	`GEOMETRYCOLLECTION(POINT(0 0),…)`

Dimensionality suffixes

Append Z (elevation), M (measure), or ZM: POINTZ, LINESTRINGM, POINTZM.

Representations

Form	Meaning
WKT	Well-Known Text (human-readable)
WKB	Well-Known Binary (storage/wire)
EWKT/EWKB	PostGIS extended: embeds SRID + Z/M `SRID=4326;POINT(77.6 14.4)`

SRID — the coordinate reference system

Every geometry carries an SRID (Spatial Reference ID) declaring how its numbers map to the Earth. Stored in the spatial_ref_sys table. The two you'll meet constantly:

SRID	System	Units	Use
`4326`	WGS 84 lon/lat	degrees	GPS, global storage, web data
`3857`	Web Mercator	metres (projected)	Tile maps (Google/OSM/Leaflet)
`326xx/327xx`	UTM zone N/S	metres	Accurate regional measurement

Mixed SRIDs error out. ST_Distance(a,b) on SRID 4326 vs SRID 0 raises Operation on mixed SRID geometries. Reproject one with ST_Transform(geom, 3857) first. Use ST_SetSRID only to label an unlabelled geometry — it does not reproject.

⚠ The defining gotcha: geometry ≠ geography

	geometry	geography
Math	Flat / Cartesian (planar)	Curved / geodesic (spheroid)
Units	Whatever the SRID says — 4326 ⇒ degrees	Always metres
Speed	Fast	Slower (great-circle math)
Function library	Full (~300 funcs)	Subset: ST_Distance, ST_DWithin, ST_Area, ST_Length, ST_Intersects, ST_Covers…
Best for	Projected/local data; heavy processing	Global lon/lat, correct distances out of the box

SRID 4326 on a geometry column is the #1 production bug. ST_Distance on two geometry(Point,4326) values returns a number in degrees — a "1 km" radius search silently matches points 111 km away. Fix: store as geography(Point,4326), or cast per-query: ST_Distance(a::geography, b::geography) → metres. A degree of longitude is ~111 km at the equator but shrinks to 0 at the poles, so degree "distance" is meaningless.

Defining a spatial table

-- typed, constrained columns (recommended over bare 'geometry') CREATE TABLE shelters ( id bigint GENERATED ALWAYS AS IDENTITY PRIMARY KEY, name text NOT NULL, geom geometry(Point, 4326), -- planar; great for tile maps after transform location geography(Point, 4326) -- geodesic; metres for nearby-search ); -- registered metadata view (auto-maintained): SELECT f_table_name, f_geometry_column, srid, type FROM geometry_columns;

PostGIS 3.6.2 (Feb 2026) · pairs with PostgreSQL 18, GEOS 3.14, PROJ 8.2 · OGC Simple Features compliant.postgis.net/docs · workshops/postgis-intro

postgis=#\df ST_* -- spatial functions█11 / 12 · POSTGIS

PostGIS — Functions & Queries

Nearly all spatial functions are prefixed ST_ (Spatial Type, from the OGC standard). Grouped by purpose: construct geometries, inspect them, test relationships, measure, process, and output.

Constructors

Function	Builds geometry from
`ST_MakePoint(x,y[,z])`	Raw coordinates (no SRID!)
`ST_Point(x,y,srid)`	Coordinates + SRID
`ST_GeomFromText(wkt,srid)`	WKT string
`ST_GeogFromText(ewkt)`	Text → geography
`ST_GeomFromGeoJSON(json)`	GeoJSON
`ST_SetSRID(geom,srid)`	Labels SRID (no reproject)

ST_MakePoint returns SRID 0. Always wrap: ST_SetSRID(ST_MakePoint(lon,lat),4326).

Accessors / inspectors

Function	Returns
`ST_X(pt) / ST_Y(pt)`	Coordinate (geometry only)
`ST_GeometryType(g)`	`ST_Point` …
`ST_SRID(g)`	The SRID
`ST_NPoints(g)`	Vertex count
`ST_IsValid(g)`	OGC validity (self-intersect?)
`ST_AsText(g) / ST_AsEWKT(g)`	Back to WKT / EWKT

Many functions assume valid geometry. Repair with ST_MakeValid(geom) before area/overlay operations.

Spatial relationships — return boolean; most are index-aware

Function	True when
`ST_Intersects(a,b)`	They share any point (most common test)
`ST_Contains(a,b)` / `ST_Within(a,b)`	a fully contains b / b within a
`ST_Covers(a,b)` / `ST_CoveredBy`	Like contains, boundary-inclusive (preferred)
`ST_DWithin(a,b,d)`	Within distance d (metres for geography) — index-aware
`ST_Touches / ST_Crosses / ST_Overlaps`	Boundary-only / partial / same-dim overlap
`ST_Disjoint(a,b)`	Share no point (not index-aware — negation)

Measurement

Function	Returns
`ST_Distance(a,b)`	Min distance (SRID units / m for geog)
`ST_Length(line)`	Length
`ST_Area(poly)`	Area
`ST_Perimeter(poly)`	Boundary length
`ST_Azimuth(a,b)`	Bearing in radians

Processing / output

Function	Produces
`ST_Buffer(g,d)`	Zone within d (geometry)
`ST_Transform(g,srid)`	Reproject to new CRS
`ST_Union(g) / ST_Intersection`	Merge / overlap
`ST_Centroid / ST_Envelope`	Center / bounding box
`ST_Simplify(g,tol)`	Fewer vertices
`ST_AsGeoJSON / ST_AsMVT`	Web output / vector tiles

Worked query — nearest shelters within 2 km

-- geography column → distance in METRES, ST_DWithin uses the spatial index SELECT name, ST_Distance(location, ref.g) AS metres FROM shelters, (SELECT ST_SetSRID(ST_MakePoint(79.99, 14.44), 4326)::geography AS g) ref WHERE ST_DWithin(location, ref.g, 2000) -- 2000 m; prefiltered by GiST index ORDER BY location <-> ref.g -- KNN operator: true index-ordered nearest-first LIMIT 5;

Worked query — spatial join (points-in-polygon count)

-- "how many shelters in each district?" ST_Contains is index-aware SELECT d.name, count(s.*) AS n FROM districts d LEFT JOIN shelters s ON ST_Contains(d.geom, s.geom) GROUP BY d.name ORDER BY n DESC;

Constructor SRID discipline. The three most common "why is my result wrong" causes: forgetting SRID on ST_MakePoint (defaults to 0), storing lon/lat in geometry instead of geography, and passing lat/lon reversed — PostGIS expects (longitude, latitude), i.e. (X, Y).

~300 ST_ functions; geography supports a deliberate subset. Index-aware predicates auto-add a bounding-box prefilter.postgis.net/docs/reference.html

postgis=#CREATE INDEX … USING GIST█12 / 12 · POSTGIS

PostGIS — Spatial Indexing & Performance

A spatial index can't index irregular shapes directly, so it indexes each geometry's bounding box in a GiST R-Tree. Queries then run in two phases: a fast index filter, then an exact refine.

Create the index — and the one fatal typo

CREATE INDEX shelters_geom_gix ON shelters USING GIST (geom); ANALYZE shelters; -- refresh planner stats so the index actually gets chosen -- geography indexes identically; GiST handles the sphere/dateline/poles correctly CREATE INDEX shelters_geog_gix ON shelters USING GIST (location);

Omit USING GIST and you get a B-tree. A B-tree tries to index the whole geometry and errors with index row requires N bytes, maximum size is 8191 — and even when it builds, it can't answer spatial queries. The USING GIST clause is mandatory for spatial columns.

The filter-and-refine model

1

Index filterbounding box

GiST R-Tree returns rows whose bounding boxes overlap the query box. Fast, approximate, may include false positives. → candidates

2

Exact refinetrue geometry

The exact predicate (e.g. ST_Intersects) runs only on candidates, confirming real matches. → result

Index-aware functions (ST_Intersects, ST_Contains, ST_Covers, ST_DWithin, ST_Within …) inject phase 1 automatically. ST_DWithin internally expands the bounding box by the distance and applies && on both sides — which is why it's vastly faster than filtering on ST_Distance(...) < d.

Spatial operators

Op	Meaning	Use
`&&`	Bounding boxes overlap/touch (2-D)	Manual index prefilter; pair with exact test
`<->`	Distance between geometries (KNN)	`ORDER BY geom <-> pt` — index-ordered nearest-neighbour
`<#>`	Distance between bounding boxes (KNN)	Approximate nearest, cheapest
`~ / @`	Box contains / contained by	Pure box containment

Why ST_Relate & ST_Distance stay slow. They are not index-aware — no automatic && prefilter. For a bounding-box search without an exact predicate, add && yourself: WHERE a.geom && b.geom. For nearest-neighbour, prefer the <-> KNN operator over sorting by ST_Distance.

Index choice for spatial data

Index	When
GiST	Default. R-Tree over bounding boxes; handles all geometry & geography
SP-GiST	Point-heavy data with strong spatial clustering (space-partitioned)
BRIN	Huge tables already physically sorted by location; tiny index, weaker filtering — poor for trajectories

Confirm the index is used

EXPLAIN ANALYZE SELECT name FROM shelters WHERE ST_DWithin(location, :pt::geography, 2000); -- want to see: Index Scan using shelters_geog_gix (not Seq Scan)

If you still see Seq Scan: run ANALYZE (stale stats), check the function is index-aware, confirm both sides share one SRID/type, and ensure you didn't wrap the column in a function (ST_Transform(geom,…) in the WHERE defeats the index — index the transformed expression instead, or store a second projected column).

GiST R-Tree on bounding boxes · filter-then-refine · index-aware predicates auto-prefilter · <-> for true KNN. PostGIS 3.6.2.postgis.net/workshops/postgis-intro/indexing.html

PostgreSQL Master Reference

The SELECT Statement — Grammar & Execution Flow

Complete SELECT grammar (written order)

The execution pipeline — tables in, table out

Relational-algebra equivalent

Data Types — the Vocabulary

Numeric

Text

Temporal

Boolean, UUID & semi-structured

Casting & literals

Filtering, NULL Logic & CASE

Predicate toolkit

Three-valued logic

NULL traps & fixes

Conditional expressions

Functions & Operators

String

Date / time

Math & conversion

Aggregates

JOINs, Set Ops & Subqueries

Join grammar

ON vs USING vs NATURAL

Set operations

Subqueries — four shapes

GROUP BY vs Window Functions

GROUP BY — collapses (N → G)

WINDOW — preserves (N → N)

The OVER clause grammar

Window function catalog

CTEs & Recursion

Grammar

Plain CTE — readability

MATERIALIZED control

Recursive CTE — walking a hierarchy

Input: employees

Output: chain

Beyond SELECT: Write, Define, Control

DML — manipulate data

DDL — define structure

TCL — transactions

DCL — access

Indexes & EXPLAIN Performance

Index types — which when

Targeted index recipes

Why an index gets ignored

Reading EXPLAIN

PostGIS — Spatial Types & SRID

Enable & verify

The geometry type hierarchy

Dimensionality suffixes

Representations

SRID — the coordinate reference system

⚠ The defining gotcha: geometry ≠ geography

Defining a spatial table

PostGIS — Functions & Queries

Constructors

Accessors / inspectors

Spatial relationships — return boolean; most are index-aware

Measurement

Processing / output

Worked query — nearest shelters within 2 km

Worked query — spatial join (points-in-polygon count)

PostGIS — Spatial Indexing & Performance

Create the index — and the one fatal typo

The filter-and-refine model

Spatial operators

Index choice for spatial data

Confirm the index is used