postgres=#\h SELECT█01 / 09

The SELECT Statement — Grammar & Execution Flow

SQL is written in one order and executed in another. The grammar says what's legal; the pipeline says why — why an alias works in ORDER BY but not WHERE, why aggregates are banned in WHERE but allowed in HAVING.

nonterminal keyword metasymbol PostgreSQL extension

Complete SELECT grammar (written order)

select_query = [ with_clause ] query_body [ ORDER BY sort_item {"," sort_item} ] [ limit_clause ] [ lock ] ";" ; query_body = query_term { ( UNION | EXCEPT ) [ ALL ] query_term } ; query_term = query_prim { INTERSECT [ ALL ] query_prim } ; query_prim = select_block | VALUES rows | TABLE name | "(" query_body ")" ; select_block = SELECT [ ALL | DISTINCT [ ON "(" expr {"," expr} ")" ] ] select_list [ FROM from_item {"," from_item} ] -- PG: FROM optional → SELECT now(); [ WHERE condition ] [ GROUP BY group_item {"," group_item} ] [ HAVING condition ] [ WINDOW name AS "(" window_def ")" ] ; limit_clause = LIMIT (n|ALL) [ OFFSET m ] | OFFSET m [ FETCH (FIRST|NEXT) n ROWS (ONLY|WITH TIES) ] ;

The execution pipeline — tables in, table out

1

FROM / JOINrelation ⋈

Assemble sources; multiple tables cross-join, then JOIN conditions apply. → T₁

2

WHEREselect σ

Keep rows where predicate is TRUE (not FALSE/NULL). No aggregates — groups don't exist yet. → T₂

3

GROUP BYgroup γ

Partition by key; collapse each group to one row. Cardinality drops. → T₃

4

HAVINGselect σ

Filter group rows; aggregates legal here because groups now exist. → T₄

5

SELECTproject π

Evaluate output expressions & window functions; assign aliases — result columns now exist. → T₅

6

DISTINCTdedup δ

Remove duplicate result rows. → T₆

7

ORDER BYsort τ

Impose order; aliases from step 5 visible here — the only place order is guaranteed. → T₇

8

LIMIT / OFFSETslice

Skip OFFSET, return at most LIMIT. Delivered to client. → RESULT

The alias rule, explained by the pipeline. WHERE (2) runs before SELECT (5), so a SELECT alias is invisible to WHERE/HAVING — repeat the full expression. ORDER BY (7) runs after SELECT, so it can use the alias.

Relational-algebra equivalent

SELECT dept, SUM(sales) AS total FROM s WHERE region='East' GROUP BY dept HAVING SUM(sales)>150 ORDER BY total DESC; τ_total↓ ( π_dept,total ( σ_SUM>150 ( _deptγ_{SUM(sales)→total} ( σ_{region='East'} ( s ) ) ) ) ) sort project HAVING GROUP+aggregate WHERE source

Three PostgreSQL divergences. ① LIMIT/OFFSET aren't standard (portable: OFFSET … FETCH). ② DISTINCT ON (k) keeps the first row per key — PG-only. ③ FROM may be omitted entirely.

Execution order is logical; the planner may reorder physically but results match this model.postgresql.org/docs/18/sql-select.html

postgres=#\dT -- data types█02 / 09

Data Types — the Vocabulary

Every column, expression and comparison has a type. Choosing well prevents silent bugs (float money, naive timestamps) and unlocks the right operators and indexes.

Numeric

Type	Use for	Note
`smallint/int/bigint`	Whole numbers	2 / 4 / 8 bytes
`numeric(p,s)`	Money, exact	Arbitrary precision, no rounding error
`real / double precision`	Scientific	Float — never for currency
`serial / bigserial`	Auto-increment	Legacy; prefer `GENERATED … IDENTITY`

Text

Type	Use for	Note
`text`	Default choice	Unlimited; no perf penalty in PG
`varchar(n)`	Length-capped	Enforces a max; otherwise = text
`char(n)`	Fixed width	Space-padded; rarely wanted

text and varchar perform identically in PG — reach for text unless you need a hard length limit.

Temporal

Type	Stores
`date`	Calendar day, no time
`time [tz]`	Time of day
`timestamp`	Date+time, no zone (naive)
`timestamptz`	Zone-aware — use this
`interval`	Duration (`'2 days 3 hours'`)

Default to timestamptz. It stores UTC and converts on display; timestamp drops zone context and causes off-by-hours bugs.

Boolean, UUID & semi-structured

Type	Values / use
`boolean`	`TRUE` / `FALSE` / `NULL`
`uuid`	128-bit id; `gen_random_uuid()`
`bytea`	Raw binary
`enum`	`CREATE TYPE mood AS ENUM(…)`
`jsonb`	Indexed JSON; `->` `->>` `@>`
`type[]`	Array: `int[]`, `text[]`
`tsvector` / `int4range`	Full-text / range types

Casting & literals

SELECT '2026-06-29'::date, -- PG cast shorthand :: CAST('42' AS int), -- SQL-standard cast 3.0::numeric / 2, -- 1.5 (int/int truncates to 1) '{1,2,3}'::int[], -- array literal '{"a":1}'::jsonb ->> 'a'; -- → '1' (text)

Integer division truncates. 5/2 = 2. Force real division by casting one side: 5::numeric/2 = 2.5.

Type drives operators, indexes & comparison semantics — choose deliberately.postgresql.org/docs/18/datatype.html

postgres=#-- WHERE · NULL · CASE█03 / 09

Filtering, NULL Logic & CASE

WHERE keeps a row only when its condition is TRUE. NULL turns Boolean logic from two values into three — the source of more silent bugs than any other SQL feature.

Predicate toolkit

Predicate	Matches	Example
`= <> < > <= >=`	Comparison	`price >= 100`
`BETWEEN a AND b`	Inclusive range	`age BETWEEN 18 AND 65`
`IN (…)`	Any of a set	`state IN ('AP','TS')`
`LIKE` / `ILIKE`	Pattern (`%` `_`); ILIKE case-insensitive	`name ILIKE 'kal%'`
`~` / `~*`	POSIX regex / case-insensitive	`code ~ '^[A-Z]{3}$'`
`IS [NOT] NULL`	Null test	`deleted_at IS NULL`
`IS [NOT] DISTINCT FROM`	NULL-safe equality	`a IS NOT DISTINCT FROM b`

Three-valued logic

Any comparison with NULL yields UNKNOWN. A row passes WHERE only on TRUE.

AND	T	F	NULL
T	T	F	NULL
F	F	F	F
NULL	NULL	F	NULL

OR	T	F	NULL
T	T	T	T
F	T	F	NULL
NULL	T	NULL	NULL

NULL traps & fixes

Trap	Fix
`x = NULL` never true	`x IS NULL`
`NOT IN (…,NULL)` → 0 rows	Filter NULLs / `NOT EXISTS`
`COUNT(col)` skips NULLs	`COUNT(*)` for total rows
`a = b` false when both NULL	`IS NOT DISTINCT FROM`
`'x'\|\|NULL` → NULL	`COALESCE(col,'')`

UNIQUE allows many NULLs. NULL ≠ NULL, so duplicate NULLs pass a unique constraint unless you add NULLS NOT DISTINCT (PG 15+).

Conditional expressions

-- CASE: the if/else of SQL SELECT name, CASE WHEN score >= 90 THEN 'A' WHEN score >= 75 THEN 'B' ELSE 'C' END AS grade FROM students; -- NULL helpers COALESCE(a, b, 0) -- first non-NULL argument NULLIF(x, 0) -- NULL if x=0 (guard division) GREATEST(a,b,c) / LEAST(a,b,c) -- ignore NULLs

Conditional aggregation — pivot without a pivot: SUM(CASE WHEN region='East' THEN sales ELSE 0 END). Cleaner: SUM(sales) FILTER (WHERE region='East').

WHERE filters rows (pre-group); HAVING filters groups (post-group).postgresql.org/docs/18/functions-conditional.html

postgres=#\df -- functions█04 / 09

Functions & Operators

The working toolkit, grouped by what you're transforming. Scalar functions act per row; aggregates collapse many rows into one.

String

Function	Result
`a \|\| b`	Concatenate
`length(s)`	Character count
`lower / upper / initcap`	Case transforms
`trim / ltrim / rtrim`	Strip whitespace
`substring(s,from,len)`	Extract slice
`replace(s,old,new)`	Substitute
`split_part(s,delim,n)`	Nth field
`position(sub in s)`	Index of substring
`format('%s=%s',k,v)`	printf-style build
`string_agg(x,',')`	aggregate: join rows

Date / time

Function	Result
`now() / current_date`	Current moment / day
`age(ts)`	Interval from now
`date_trunc('month',ts)`	Round down to unit
`extract(year from ts)`	Pull a field
`ts + interval '1 day'`	Date arithmetic
`to_char(ts,'YYYY-MM')`	Format to text
`to_date(s,'DD/MM/YYYY')`	Parse from text
`generate_series(a,b,step)`	Row-set of values

Math & conversion

Function	Result
`round(x,n) / trunc`	Round / truncate
`ceil / floor / abs`	Standard math
`mod(a,b) / a % b`	Remainder
`power(a,b) / sqrt`	Exponent / root
`random()`	0 ≤ x < 1
`x::type / cast(x as t)`	Convert type

Aggregates

Function	Result
`count(*) / count(col)`	All rows / non-NULL
`sum / avg / min / max`	Standard rollups
`array_agg(x)`	Collect into array
`string_agg(x,sep)`	Collect into string
`jsonb_agg(x)`	Collect into JSON
`bool_and / bool_or`	Logical rollup
`percentile_cont(0.5) WITHIN GROUP (ORDER BY x)`	Median

FILTER beats CASE inside aggregates. Instead of SUM(CASE WHEN paid THEN amt END), write SUM(amt) FILTER (WHERE paid) — clearer and NULL-clean. Any aggregate accepts FILTER (WHERE …).

DISTINCT inside an aggregate. COUNT(DISTINCT user_id) counts unique values; plain COUNT(user_id) counts non-NULL occurrences — they differ whenever duplicates exist.

Scalar = per row · Aggregate = many → one · both composable with window OVER.postgresql.org/docs/18/functions.html

postgres=#\d -- combine relations█05 / 09

JOINs, Set Ops & Subqueries

Three ways to combine relations: JOINs widen rows (more columns), set operations stack rows (same columns), subqueries nest one query inside another.

Join grammar

from_item = source { join } ; join = ( [INNER] | (LEFT|RIGHT|FULL) [OUTER] ) JOIN source ( ON cond | USING "("col…")" ) | CROSS JOIN source | NATURAL … JOIN source | [LATERAL] … ;

Join	Keeps	Unmatched rows
`INNER`	Matching pairs only	Dropped both sides
`LEFT OUTER`	All left + matches	Left kept, right = NULL
`RIGHT OUTER`	All right + matches	Right kept, left = NULL
`FULL OUTER`	Everything	Either side padded NULL
`CROSS`	Every combination	Cartesian product
`LATERAL`	Per-row correlated subquery	Right may reference left's columns

ON vs USING vs NATURAL

ON a.id=b.id — explicit, both columns survive.
USING (id) — equal-named cols, merged to one.
NATURAL JOIN — auto-matches all shared names; convenient but fragile — a new shared column silently changes results.

Set operations

q1 UNION [ALL] q2 -- combine q1 INTERSECT q2 -- in both q1 EXCEPT q2 -- in q1 not q2

Same column count & compatible types. Default dedups; ALL keeps dups. INTERSECT binds tighter than UNION/EXCEPT.

Subqueries — four shapes

-- 1. Scalar: returns one value, usable anywhere a value fits SELECT name, salary, (SELECT avg(salary) FROM emp) AS company_avg FROM emp; -- 2. IN / ANY / ALL: membership against a column of values SELECT * FROM emp WHERE dept_id IN (SELECT id FROM dept WHERE region='East'); -- 3. EXISTS: correlated true/false test — stops at first match, NULL-safe SELECT * FROM dept d WHERE EXISTS (SELECT 1 FROM emp e WHERE e.dept_id = d.id); -- 4. Derived table: a subquery in FROM, aliased like a table SELECT region, max_sal FROM (SELECT region, max(salary) max_sal FROM emp GROUP BY region) t;

EXISTS over IN for NULL safety. NOT IN (subquery) returns zero rows if the subquery yields a single NULL. NOT EXISTS has no such trap and usually plans better for anti-joins.

JOIN = wider rows · SET = stacked rows · SUBQUERY = nested logic. Correlated subqueries re-run per outer row.postgresql.org/docs/18/queries-table-expressions.html

postgres=#-- collapse vs keep█06 / 09

GROUP BY vs Window Functions

Same arithmetic, opposite shapes. GROUP BY folds many rows into one summary. A window function keeps every row and attaches a value computed over a window of related rows.

GROUP BY — collapses (N → G)

SELECT dept, SUM(sales) AS total FROM s GROUP BY dept;

dept	total
East	400
West	800

Employees gone — you can't also select employee; no single value per group.

WINDOW — preserves (N → N)

SELECT employee, dept, sales, SUM(sales) OVER (PARTITION BY dept) AS dt FROM s;

emp	dept	sales	dt
Asha	East	100	400
Bina	East	300	400
Deepa	West	600	800

Every row survives and knows its group total. Detail and aggregate together.

The OVER clause grammar

window_call = func "("[args]")" [ FILTER "("WHERE cond")" ] OVER ( window_def ) ; window_def = [ PARTITION BY expr… ] -- like GROUP BY, rows NOT collapsed [ ORDER BY sort… ] -- order for ranking & running totals [ frame ] ; frame = (ROWS|RANGE|GROUPS) BETWEEN start AND end ; start/end = UNBOUNDED PRECEDING | n PRECEDING | CURRENT ROW | n FOLLOWING | UNBOUNDED FOLLOWING ;

Window function catalog

Function	Returns	Needs ORDER BY
`ROW_NUMBER()`	Unique sequential index	Yes
`RANK() / DENSE_RANK()`	Rank with / without gaps after ties	Yes
`LAG(x,n) / LEAD(x,n)`	Value n rows back / ahead	Yes
`SUM(x) OVER(ORDER BY …)`	Running / cumulative total	Yes
`NTILE(k)`	Split into k buckets	Yes
`FIRST_VALUE / LAST_VALUE / NTH_VALUE`	Value at frame edge / position	Usually

Decision rule. One row per group (a report)? → GROUP BY. Every row plus a peer-computed value (rank, running total, share, diff-from-avg)? → window OVER (…). Windowing runs after grouping, so the two can coexist.

Math: GROUP BY = γ in bag algebra (many→one). Window = per-row map over an ordered context (one→one). Neither is in pure relational calculus.postgresql.org/docs/18/tutorial-window.html

postgres=#WITH RECURSIVE …█07 / 09

CTEs & Recursion

A Common Table Expression names a subquery up front, so complex logic reads top-to-bottom instead of inside-out. RECURSIVE extends this to hierarchies and graphs — the standard way to walk a tree.

Grammar

with_clause = WITH [ RECURSIVE ] cte {"," cte} ; cte = name [ "("col…")" ] AS [ [NOT] MATERIALIZED ] "(" query ")" ;

Plain CTE — readability

WITH regional AS ( SELECT region, SUM(sales) total FROM s GROUP BY region ) SELECT * FROM regional WHERE total > 500 ORDER BY total DESC;

Name the intermediate result once, reuse it. Chains of CTEs read as sequential steps.

MATERIALIZED control

PG may inline a CTE into the outer query for speed. Force "compute once" with AS MATERIALIZED, or allow inlining with NOT MATERIALIZED. Use MATERIALIZED when the CTE is expensive and referenced multiple times.

Recursive CTE — walking a hierarchy

Two parts joined by UNION ALL: a base case (anchor) and a recursive case that references the CTE itself. PG iterates until the recursive part returns no new rows.

-- employees(id, name, manager_id) → everyone under Asha, with depth WITH RECURSIVE chain AS ( -- base case: the starting node SELECT id, name, manager_id, 1 AS depth FROM employees WHERE name = 'Asha' UNION ALL -- recursive case: children of rows already found SELECT e.id, e.name, e.manager_id, c.depth + 1 FROM employees e JOIN chain c ON e.manager_id = c.id ) SELECT repeat(' ', depth-1) || name AS tree, depth FROM chain ORDER BY depth;

Input: employees

id	name	mgr
1	Asha	—
2	Bina	1
3	Chandu	1
4	Deepa	2

Output: chain

tree	depth
Asha	1
Bina	2
Chandu	2
Deepa	3

Guard against infinite loops. On cyclic graphs an unguarded recursive CTE never terminates. PG 14+ offers UNION … CYCLE col SET is_cycle USING path; otherwise track a visited path array and filter.

Data-modifying CTEs (PG extension). WITH moved AS (DELETE FROM old WHERE … RETURNING *) INSERT INTO archive SELECT * FROM moved; — delete and re-insert atomically in one statement.

CTEs = top-down readability · RECURSIVE = anchor + UNION ALL self-reference until fixpoint.postgresql.org/docs/18/queries-with.html

postgres=#\h -- DML·DDL·TCL·DCL█08 / 09

Beyond SELECT: Write, Define, Control

DML changes rows · DDL changes structure · TCL controls transactions · DCL controls access.

DML — manipulate data

insert = INSERT INTO table ["("col…")"] ( VALUES row… | select_query ) [ ON CONFLICT [target] ( DO NOTHING | DO UPDATE SET col"="EXCLUDED.col ) ] -- UPSERT [ RETURNING ("*"|expr…) ] ; -- RETURNING = PG ext update = UPDATE table SET col"="expr… [ FROM from_item ] [ WHERE cond ] [ RETURNING … ] ; delete = DELETE FROM table [ USING from_item ] [ WHERE cond ] [ RETURNING … ] ; merge = MERGE INTO tgt USING src ON cond { WHEN [NOT] MATCHED THEN (INSERT|UPDATE|DELETE) } ;

WHERE-less UPDATE/DELETE hits every row. Wrap risky writes in BEGIN; … ROLLBACK; to preview the row count before committing.

DDL — define structure

create_table = CREATE [TEMP|UNLOGGED] TABLE [IF NOT EXISTS] name "(" col type {constraint} {"," …} ")" [ PARTITION BY (RANGE|LIST|HASH) … ] ; constraint = PRIMARY KEY | UNIQUE [NULLS NOT DISTINCT] | NOT NULL | DEFAULT expr | CHECK "("cond")" | GENERATED ALWAYS AS IDENTITY | REFERENCES tbl [ON DELETE action] ; alter = ALTER TABLE name ( ADD|DROP|ALTER COLUMN … | ADD CONSTRAINT … | RENAME … ) ; drop = DROP (TABLE|VIEW|INDEX) [IF EXISTS] name [CASCADE|RESTRICT] ;

TCL — transactions

BEGIN [ISOLATION LEVEL lvl]; SAVEPOINT sp; ROLLBACK TO sp; COMMIT; ROLLBACK;

Levels: READ COMMITTED (default) · REPEATABLE READ · SERIALIZABLE. ACID is per-transaction.

DCL — access

GRANT priv ON obj TO role; REVOKE priv ON obj FROM role;

priv = SELECT/INSERT/UPDATE/DELETE/ALL. Row-level security via CREATE POLICY.

RETURNING, ON CONFLICT, data-modifying CTEs & NULLS NOT DISTINCT are PostgreSQL extensions.postgresql.org/docs/18/sql-commands.html

postgres=#EXPLAIN ANALYZE …█09 / 09

Indexes & EXPLAIN Performance

An index is a shortcut the planner may use. Knowing which index suits which query — and reading the plan to confirm it's used — is the difference between a 5ms and a 5s query.

Index types — which when

Type	Best for	Example
B-tree (default)	Equality & range on scalars; sorting	`WHERE price > 100 ORDER BY price`
Hash	Equality only	`WHERE id = 42`
GIN	Multi-value: jsonb, arrays, full-text	`WHERE tags @> '{sql}'`
GiST	Geometric, ranges, nearest-neighbour	`WHERE geom && box`
BRIN	Huge, naturally-ordered tables (time-series)	`WHERE created_at > …`
SP-GiST	Non-balanced: quadtrees, IP ranges	`WHERE inet && '10.0.0.0/8'`

Targeted index recipes

-- composite: order matters (leftmost prefix) CREATE INDEX ON orders (cust_id, created_at); -- partial: index only rows you query CREATE INDEX ON users (email) WHERE active; -- expression: match the WHERE expression CREATE INDEX ON users (lower(email)); -- covering: serve query from index alone CREATE INDEX ON orders (cust_id) INCLUDE (total);

Why an index gets ignored

• Function on the column: WHERE lower(email)=… skips a plain email index — index the expression instead.
• Leading wildcard: LIKE '%x' can't use B-tree.
• Type mismatch forces a cast.
• Tiny table: a seq scan is genuinely faster.
• Stale stats: run ANALYZE.

Reading EXPLAIN

EXPLAIN [ ANALYZE ] [ BUFFERS ] [ FORMAT (TEXT|JSON) ] statement ; -- ANALYZE actually runs it, showing real rows + timing (use a transaction for writes). Index Scan using orders_cust_idx on orders (cost=0.42..8.4 rows=1 width=36) (actual time=0.018..0.020 rows=1 loops=1) ← actual ≈ estimate = healthy

Plan node	Meaning	Concern
`Seq Scan`	Read whole table	Bad on large tables w/ selective filter
`Index Scan / Index Only Scan`	Use index; "Only" = no heap fetch	Good
`Bitmap Heap Scan`	Many matching rows via index	Fine for medium selectivity
`Nested Loop`	Probe inner per outer row	Blows up if outer rows large
`Hash Join / Merge Join`	Build hash / merge sorted inputs	Good for large joins

The estimate-vs-actual gap is the tell. When the planner expected rows=1 but got rows=50000, its choices rested on wrong stats — run ANALYZE, then match the index to the actual WHERE/JOIN/ORDER BY columns.

Version note. Verified on PostgreSQL 18 (stable 18.4, May 2026); v19 GA expected Sept 2026 — additive only. Always run the latest minor of your major version.

Index = optional shortcut · EXPLAIN ANALYZE = ground truth · match index to real query shape.postgresql.org/docs/18/using-explain.html

postgis=#CREATE EXTENSION postgis;█10 / 12 · POSTGIS

PostGIS — Spatial Types & SRID

PostGIS turns PostgreSQL into a spatial database: new column types, ~300 ST_ functions, and spatial indexes. Everything starts with two decisions — which geometry type, and geometry vs geography.

spatial type function / keyword operator critical gotcha

Enable & verify

CREATE EXTENSION IF NOT EXISTS postgis; -- core: types, functions, GiST support CREATE EXTENSION postgis_raster; -- optional: raster / coverage CREATE EXTENSION postgis_topology; -- optional: topological model SELECT postgis_full_version(); -- → POSTGIS 3.6.2, GEOS 3.14, PROJ 8.2 …

The geometry type hierarchy

Type	Is	WKT example
`POINT`	One location	`POINT(77.6 14.4)`
`LINESTRING`	Ordered vertices	`LINESTRING(0 0, 1 1, 2 1)`
`POLYGON`	Ring(s); first = outer	`POLYGON((0 0,4 0,4 4,0 4,0 0))`
`MULTIPOINT`	Many points	`MULTIPOINT((0 0),(1 1))`
`MULTILINESTRING`	Many lines	…
`MULTIPOLYGON`	Many polygons (islands)	…
`GEOMETRYCOLLECTION`	Mixed bag	`GEOMETRYCOLLECTION(POINT(0 0),…)`

Dimensionality suffixes

Append Z (elevation), M (measure), or ZM: POINTZ, LINESTRINGM, POINTZM.

Representations

Form	Meaning
WKT	Well-Known Text (human-readable)
WKB	Well-Known Binary (storage/wire)
EWKT/EWKB	PostGIS extended: embeds SRID + Z/M `SRID=4326;POINT(77.6 14.4)`

SRID — the coordinate reference system

Every geometry carries an SRID (Spatial Reference ID) declaring how its numbers map to the Earth. Stored in the spatial_ref_sys table. The two you'll meet constantly:

SRID	System	Units	Use
`4326`	WGS 84 lon/lat	degrees	GPS, global storage, web data
`3857`	Web Mercator	metres (projected)	Tile maps (Google/OSM/Leaflet)
`326xx/327xx`	UTM zone N/S	metres	Accurate regional measurement

Mixed SRIDs error out. ST_Distance(a,b) on SRID 4326 vs SRID 0 raises Operation on mixed SRID geometries. Reproject one with ST_Transform(geom, 3857) first. Use ST_SetSRID only to label an unlabelled geometry — it does not reproject.

⚠ The defining gotcha: geometry ≠ geography

	geometry	geography
Math	Flat / Cartesian (planar)	Curved / geodesic (spheroid)
Units	Whatever the SRID says — 4326 ⇒ degrees	Always metres
Speed	Fast	Slower (great-circle math)
Function library	Full (~300 funcs)	Subset: ST_Distance, ST_DWithin, ST_Area, ST_Length, ST_Intersects, ST_Covers…
Best for	Projected/local data; heavy processing	Global lon/lat, correct distances out of the box

SRID 4326 on a geometry column is the #1 production bug. ST_Distance on two geometry(Point,4326) values returns a number in degrees — a "1 km" radius search silently matches points 111 km away. Fix: store as geography(Point,4326), or cast per-query: ST_Distance(a::geography, b::geography) → metres. A degree of longitude is ~111 km at the equator but shrinks to 0 at the poles, so degree "distance" is meaningless.

Defining a spatial table

-- typed, constrained columns (recommended over bare 'geometry') CREATE TABLE shelters ( id bigint GENERATED ALWAYS AS IDENTITY PRIMARY KEY, name text NOT NULL, geom geometry(Point, 4326), -- planar; great for tile maps after transform location geography(Point, 4326) -- geodesic; metres for nearby-search ); -- registered metadata view (auto-maintained): SELECT f_table_name, f_geometry_column, srid, type FROM geometry_columns;

PostGIS 3.6.2 (Feb 2026) · pairs with PostgreSQL 18, GEOS 3.14, PROJ 8.2 · OGC Simple Features compliant.postgis.net/docs · workshops/postgis-intro

postgis=#\df ST_* -- spatial functions█11 / 12 · POSTGIS

PostGIS — Functions & Queries

Nearly all spatial functions are prefixed ST_ (Spatial Type, from the OGC standard). Grouped by purpose: construct geometries, inspect them, test relationships, measure, process, and output.

Constructors

Function	Builds geometry from
`ST_MakePoint(x,y[,z])`	Raw coordinates (no SRID!)
`ST_Point(x,y,srid)`	Coordinates + SRID
`ST_GeomFromText(wkt,srid)`	WKT string
`ST_GeogFromText(ewkt)`	Text → geography
`ST_GeomFromGeoJSON(json)`	GeoJSON
`ST_SetSRID(geom,srid)`	Labels SRID (no reproject)

ST_MakePoint returns SRID 0. Always wrap: ST_SetSRID(ST_MakePoint(lon,lat),4326).

Accessors / inspectors

Function	Returns
`ST_X(pt) / ST_Y(pt)`	Coordinate (geometry only)
`ST_GeometryType(g)`	`ST_Point` …
`ST_SRID(g)`	The SRID
`ST_NPoints(g)`	Vertex count
`ST_IsValid(g)`	OGC validity (self-intersect?)
`ST_AsText(g) / ST_AsEWKT(g)`	Back to WKT / EWKT

Many functions assume valid geometry. Repair with ST_MakeValid(geom) before area/overlay operations.

Spatial relationships — return boolean; most are index-aware

Function	True when
`ST_Intersects(a,b)`	They share any point (most common test)
`ST_Contains(a,b)` / `ST_Within(a,b)`	a fully contains b / b within a
`ST_Covers(a,b)` / `ST_CoveredBy`	Like contains, boundary-inclusive (preferred)
`ST_DWithin(a,b,d)`	Within distance d (metres for geography) — index-aware
`ST_Touches / ST_Crosses / ST_Overlaps`	Boundary-only / partial / same-dim overlap
`ST_Disjoint(a,b)`	Share no point (not index-aware — negation)

Measurement

Function	Returns
`ST_Distance(a,b)`	Min distance (SRID units / m for geog)
`ST_Length(line)`	Length
`ST_Area(poly)`	Area
`ST_Perimeter(poly)`	Boundary length
`ST_Azimuth(a,b)`	Bearing in radians

Processing / output

Function	Produces
`ST_Buffer(g,d)`	Zone within d (geometry)
`ST_Transform(g,srid)`	Reproject to new CRS
`ST_Union(g) / ST_Intersection`	Merge / overlap
`ST_Centroid / ST_Envelope`	Center / bounding box
`ST_Simplify(g,tol)`	Fewer vertices
`ST_AsGeoJSON / ST_AsMVT`	Web output / vector tiles

Worked query — nearest shelters within 2 km

-- geography column → distance in METRES, ST_DWithin uses the spatial index SELECT name, ST_Distance(location, ref.g) AS metres FROM shelters, (SELECT ST_SetSRID(ST_MakePoint(79.99, 14.44), 4326)::geography AS g) ref WHERE ST_DWithin(location, ref.g, 2000) -- 2000 m; prefiltered by GiST index ORDER BY location <-> ref.g -- KNN operator: true index-ordered nearest-first LIMIT 5;

Worked query — spatial join (points-in-polygon count)

-- "how many shelters in each district?" ST_Contains is index-aware SELECT d.name, count(s.*) AS n FROM districts d LEFT JOIN shelters s ON ST_Contains(d.geom, s.geom) GROUP BY d.name ORDER BY n DESC;

Constructor SRID discipline. The three most common "why is my result wrong" causes: forgetting SRID on ST_MakePoint (defaults to 0), storing lon/lat in geometry instead of geography, and passing lat/lon reversed — PostGIS expects (longitude, latitude), i.e. (X, Y).

~300 ST_ functions; geography supports a deliberate subset. Index-aware predicates auto-add a bounding-box prefilter.postgis.net/docs/reference.html

postgis=#CREATE INDEX … USING GIST█12 / 12 · POSTGIS

PostGIS — Spatial Indexing & Performance

A spatial index can't index irregular shapes directly, so it indexes each geometry's bounding box in a GiST R-Tree. Queries then run in two phases: a fast index filter, then an exact refine.

Create the index — and the one fatal typo

CREATE INDEX shelters_geom_gix ON shelters USING GIST (geom); ANALYZE shelters; -- refresh planner stats so the index actually gets chosen -- geography indexes identically; GiST handles the sphere/dateline/poles correctly CREATE INDEX shelters_geog_gix ON shelters USING GIST (location);

Omit USING GIST and you get a B-tree. A B-tree tries to index the whole geometry and errors with index row requires N bytes, maximum size is 8191 — and even when it builds, it can't answer spatial queries. The USING GIST clause is mandatory for spatial columns.

The filter-and-refine model

1

Index filterbounding box

GiST R-Tree returns rows whose bounding boxes overlap the query box. Fast, approximate, may include false positives. → candidates

2

Exact refinetrue geometry

The exact predicate (e.g. ST_Intersects) runs only on candidates, confirming real matches. → result

Index-aware functions (ST_Intersects, ST_Contains, ST_Covers, ST_DWithin, ST_Within …) inject phase 1 automatically. ST_DWithin internally expands the bounding box by the distance and applies && on both sides — which is why it's vastly faster than filtering on ST_Distance(...) < d.

Spatial operators

Op	Meaning	Use
`&&`	Bounding boxes overlap/touch (2-D)	Manual index prefilter; pair with exact test
`<->`	Distance between geometries (KNN)	`ORDER BY geom <-> pt` — index-ordered nearest-neighbour
`<#>`	Distance between bounding boxes (KNN)	Approximate nearest, cheapest
`~ / @`	Box contains / contained by	Pure box containment

Why ST_Relate & ST_Distance stay slow. They are not index-aware — no automatic && prefilter. For a bounding-box search without an exact predicate, add && yourself: WHERE a.geom && b.geom. For nearest-neighbour, prefer the <-> KNN operator over sorting by ST_Distance.

Index choice for spatial data

Index	When
GiST	Default. R-Tree over bounding boxes; handles all geometry & geography
SP-GiST	Point-heavy data with strong spatial clustering (space-partitioned)
BRIN	Huge tables already physically sorted by location; tiny index, weaker filtering — poor for trajectories

Confirm the index is used

EXPLAIN ANALYZE SELECT name FROM shelters WHERE ST_DWithin(location, :pt::geography, 2000); -- want to see: Index Scan using shelters_geog_gix (not Seq Scan)

If you still see Seq Scan: run ANALYZE (stale stats), check the function is index-aware, confirm both sides share one SRID/type, and ensure you didn't wrap the column in a function (ST_Transform(geom,…) in the WHERE defeats the index — index the transformed expression instead, or store a second projected column).

GiST R-Tree on bounding boxes · filter-then-refine · index-aware predicates auto-prefilter · <-> for true KNN. PostGIS 3.6.2.postgis.net/workshops/postgis-intro/indexing.html

postgis=#shp2pgsql · ogr2ogr · raster2pgsql█13 / 17 · POSTGIS

PostGIS — Loading & Exporting Data

Spatial data arrives as Shapefiles, GeoJSON, GeoPackage, rasters, or OSM extracts. These command-line loaders convert each into PostGIS rows. Every workflow — QGIS, IDRM ingest, ETL — starts here.

shp2pgsql — the Shapefile loader (ships with PostGIS)

# emits SQL; pipe straight into psql. Flags matter: shp2pgsql -I -D -s 4326 -g geom roads.shp public.roads | psql -d gisdb # -s 4326 source SRID (REQUIRED — wrong/missing SRID is the usual first bug) # -I build a GiST spatial index after load # -D dump format (much faster than INSERTs for big files) # -g geom name the geometry column 'geom' (QGIS-friendly) # mode: -c create (default) · -a append · -d drop+create · -p prepare-only # -G load as geography instead of geometry · -W LATIN1 fix encoding

ogr2ogr — the universal converter (GDAL; bundled with QGIS)

# reads ~100 formats. Destination FIRST, then source: ogr2ogr -f "PostgreSQL" \ PG:"host=localhost dbname=gisdb user=gis password=••• " \ shelters.geojson \ -nln shelters # new layer (table) name \ -nlt PROMOTE_TO_MULTI # Polygon → MultiPolygon (mixed-type safety) \ -lco GEOMETRY_NAME=geom # geometry column name \ -lco FID=gid # primary-key column (QGIS needs one) \ -t_srs EPSG:4326 # reproject on the way in

QGIS-ready defaults. Always set GEOMETRY_NAME=geom and FID=gid on import — QGIS needs a single integer key and looks for a conventionally named geometry column. PROMOTE_TO_MULTI avoids "found Polygon, expected MultiPolygon" failures on mixed Shapefiles.

raster2pgsql — rasters

raster2pgsql -I -C -s 4326 \ -t 256x256 -F \ flood_depth.tif public.flood | psql -d gisdb # -I index · -C constraints (registers in # raster_columns) · -t tile size # -s SRID · -F add filename column # -R register out-of-db (keep file on disk)

Tiling (-t) is essential — untiled rasters force whole-file reads. For IDRM flood/hazard layers, COG + -R keeps storage lean.

osm2pgsql — OpenStreetMap

osm2pgsql --create --database gisdb \ --slim -G andhra.osm.pbf # → planet_osm_point / _line # / _roads / _polygon tables

Base layer for basemaps & routing. Pair with osm2pgrouting to build a pgRouting network (poster 17).

Plain SQL & CSV — when you already have lon/lat columns

-- load a CSV of coordinates, then build geometry from the columns \copy raw_points(name, lon, lat) FROM 'incidents.csv' WITH (FORMAT csv, HEADER); ALTER TABLE raw_points ADD COLUMN geom geometry(Point, 4326); UPDATE raw_points SET geom = ST_SetSRID(ST_MakePoint(lon, lat), 4326); -- (lon, lat) order!

Exporting back out

Target	Command
Shapefile	`pgsql2shp -f out.shp gisdb "SELECT * FROM roads"`
GeoJSON / GPKG / any	`ogr2ogr -f GPKG out.gpkg PG:"…" -sql "SELECT …"`
Inline GeoJSON	`SELECT ST_AsGeoJSON(geom) FROM roads;`

shp2pgsql & raster2pgsql ship with PostGIS · ogr2ogr/osm2pgsql via GDAL/QGIS · always pipe to psql, always set SRID.postgis.net/workshops/postgis-intro/loading_data.html

postgis=#-- QGIS layer + map production█14 / 17 · POSTGIS

PostGIS — QGIS Integration & Map Production

QGIS reads PostGIS tables directly as live layers. Three things make a table (or view) load cleanly: a registered geometry column, a declared SRID, and exactly one unique integer key.

What QGIS requires of a layer

Requirement	Why	How
Single unique key	QGIS needs a stable feature id to page & edit	A `bigint` PK; for views add one (below)
Typmod geometry column	So the SRID & type register correctly	`geometry(Point,4326)`, not bare `geometry`
Declared SRID	Reproject-on-the-fly & correct placement	`ST_SetSRID` / typed column / `ST_Transform`
Spatial index	Pan/zoom stays fast on big layers	`CREATE INDEX … USING GIST (geom)`

Views need a manufactured key. A view has no primary key, so QGIS may refuse it or pick the wrong column. Add a synthetic unique id and pick it in the "Add PostGIS layer" dialog: ROW_NUMBER() OVER () AS qgis_id. Also cast the geometry to typmod form in the view (geom::geometry(Point,4326)) so it registers in geometry_columns.

The metadata catalog QGIS reads

Object	Holds
`spatial_ref_sys`	Every SRID: `srid`, `auth_name` (EPSG), `srtext`, `proj4text`
`geometry_columns`	View: table, column, `coord_dimension`, `srid`, `type` for each geometry
`geography_columns`	Same, for geography columns (auto-maintained)

-- inspect what QGIS will see SELECT f_table_name, f_geometry_column, srid, type FROM geometry_columns; -- find an EPSG code by name (e.g. UTM 44N for Andhra Pradesh) SELECT srid, auth_name, srtext FROM spatial_ref_sys WHERE srtext ILIKE '%UTM zone 44N%'; -- confirm a column's SRID SELECT Find_SRID('public', 'shelters', 'geom');

Common projections for an Indian deployment

EPSG	System	Use
`4326`	WGS 84 lon/lat	Storage, GPS, exchange
`3857`	Web Mercator	Web/tile basemaps
`32644`	WGS 84 / UTM 44N	Metric measurement, most of AP/Telangana
`7755`	WGS 84 / India zone (NIANG)	National grid

Dual-projection pattern (store once, map two ways)

-- keep a metric copy alongside lon/lat so QGIS can measure AND web-map ALTER TABLE shelters ADD COLUMN geom_utm geometry(Point, 32644); UPDATE shelters SET geom_utm = ST_Transform(geom, 32644); CREATE INDEX ON shelters USING GIST (geom_utm); -- QGIS shows both columns as choosable layers

Import path tip. In QGIS, prefer Database → DB Manager → Import or "Export to PostgreSQL" over drag-and-drop — both use COPY under the hood (far faster than row-by-row INSERT), and let you set the SRID, primary key, and geometry column name at load time.

Registered geometry column + one unique key + declared SRID = a layer QGIS loads, styles, edits & reprojects live.postgis.net · QGIS PostGIS data provider

postgis=#ST_MakeValid · ST_Buffer · ST_Union█15 / 17 · POSTGIS

PostGIS — Validity & Geometry Processing

PostGIS follows the OGC rule that most functions assume valid, simple geometry. Real-world admin boundaries, flood polygons and digitized data routinely violate that — so validate first, then process.

Validity — inspect & repair

Function	Returns / does
`ST_IsValid(g)`	boolean — is it OGC-valid?
`ST_IsValidReason(g)`	Text: e.g. "Ring Self-intersection at …"
`ST_IsValidDetails(g)`	Reason + offending location
`ST_MakeValid(g)`	Repairs in place (accepts invalid input)
`ST_IsSimple(g)`	No self-intersection (lines/points)

-- find and fix invalid polygons in one pass, see what was wrong UPDATE districts SET geom = ST_MakeValid(geom) WHERE NOT ST_IsValid(geom) RETURNING name, ST_IsValidReason(geom); -- enforce validity going forward with a CHECK constraint ALTER TABLE districts ADD CONSTRAINT geom_valid CHECK (ST_IsValid(geom));

Why it matters. ST_Area, ST_Intersection, ST_Union and overlays give wrong answers or errors on invalid input. PostGIS does not validate on load (too slow), so an explicit ST_MakeValid step belongs in every ingest pipeline.

Processing & analysis toolkit

Function	Produces
`ST_Buffer(g,d)`	Zone within distance d (e.g. evacuation radius)
`ST_Union(g)`	Dissolve/merge many into one
`ST_Intersection(a,b)`	Overlapping portion
`ST_Difference(a,b)`	a minus b
`ST_ConvexHull(g)`	Tightest enclosing convex polygon

Function	Produces
`ST_Simplify(g,tol)`	Fewer vertices (may break topology)
`ST_SimplifyPreserveTopology`	Simplify without self-intersecting
`ST_Collect(g)`	Bundle into Multi* / collection (agg)
`ST_Dump(g)`	Explode collection → one row per part
`ST_Centroid / ST_PointOnSurface`	Center / guaranteed-inside point

Worked example — 5 km evacuation zone, dissolved

-- buffer each hazard point 5 km (metric SRID!), merge overlaps, count affected shelters WITH zone AS ( SELECT ST_Union(ST_Buffer(ST_Transform(geom, 32644), 5000)) AS g -- 5000 m in UTM 44N FROM hazards WHERE kind = 'flood' ) SELECT count(*) AS shelters_at_risk FROM shelters s, zone WHERE ST_Intersects(ST_Transform(s.geom, 32644), zone.g);

Buffer in metres, not degrees. ST_Buffer on a 4326 geometry buffers by degrees. Transform to a metric CRS (UTM/local grid) first, or use geography. Same rule as the distance trap on poster 10.

Validate → repair → process. ST_MakeValid before any overlay; ST_SimplifyPreserveTopology before any tiling.postgis.net/docs · ST_MakeValid, ST_IsValidReason

postgis=#ST_AsMVT · ST_ClusterDBSCAN█16 / 17 · POSTGIS

PostGIS — Vector Tiles & Clustering

Serve maps to the browser as Mapbox Vector Tiles generated in-database, and reduce thousands of incident points to meaningful clusters — both are single-query operations PostGIS does natively.

The canonical vector-tile pipeline

A tile server converts a {z}/{x}/{y} request into one query. Three functions cooperate: ST_TileEnvelope builds the tile's bounds, ST_AsMVTGeom clips & quantizes geometry into tile space, ST_AsMVT aggregates rows into the binary tile.

WITH mvtgeom AS ( SELECT ST_AsMVTGeom( ST_Transform(geom, 3857), -- tile space is Web Mercator ST_TileEnvelope(:z, :x, :y), -- bounds for this tile extent => 4096, buffer => 64 -- standard grid + edge buffer ) AS geom, name, kind -- attributes ride along into the tile FROM shelters WHERE geom && ST_Transform( -- && = index prefilter to this tile ST_TileEnvelope(:z, :x, :y, margin => 64.0/4096), 4326) ) SELECT ST_AsMVT(mvtgeom.*) FROM mvtgeom; -- → bytea, send as application/x-protobuf

Why each piece exists. The && filter uses the GiST index so only this tile's rows are read. ST_AsMVTGeom clips to the tile, fixes validity, and rescales to the 0–4096 integer grid. The buffer avoids clipped labels/lines at tile edges. Wrap as a z/x/y endpoint and any MapLibre/Leaflet client renders it.

Composite (multi-layer) tiles. UNION several ST_AsMVTGeom sub-selects, each tagged with a layer name passed as ST_AsMVT(rows.*, 'layername') — one request returns roads + shelters + hazards together.

Clustering — turn point swarms into insight

Function	Kind	Use for
`ST_ClusterDBSCAN(geom, eps, minpts)`	Window fn	Density hotspots; eps in CRS units, noise = NULL cluster
`ST_ClusterKMeans(geom, k)`	Window fn	Partition into exactly k groups (e.g. k zones)
`ST_ClusterWithin(geom, d)`	Aggregate	Group geometries within distance d
`ST_ClusterIntersecting(geom)`	Aggregate	Connected/touching sets

-- incident hotspots: DBSCAN, 250 m radius, ≥5 incidents to form a cluster SELECT id, kind, ST_ClusterDBSCAN(ST_Transform(geom, 32644), eps => 250, minpoints => 5) OVER () AS cluster_id -- window fn: every row keeps its cluster label FROM incidents; -- then summarize each cluster into a hull + count for the map SELECT cluster_id, count(*) AS n, ST_ConvexHull(ST_Collect(geom)) AS footprint FROM clustered WHERE cluster_id IS NOT NULL GROUP BY cluster_id;

DBSCAN vs K-Means. DBSCAN finds natural dense groups, leaves outliers unclustered (NULL), and needs no preset count — ideal for "where are incidents concentrating?". K-Means forces every point into exactly k clusters — use when you must split into a fixed number of response zones.

ST_TileEnvelope → ST_AsMVTGeom → ST_AsMVT = tiles from SQL · DBSCAN/KMeans are window functions over geometry. PostGIS 3.6.postgis.net/docs · ST_AsMVT, ST_ClusterDBSCAN

postgis=#CREATE EXTENSION pgrouting;█17 / 17 · PGROUTING

pgRouting — Response Logistics

pgRouting adds graph algorithms on top of PostGIS: shortest path, reachable area, cost matrices. The disaster-response capstone — route responders, compute coverage, assign the nearest unit.

The network model

Every algorithm runs over an edges SQL result exposing four columns (plus optional reverse cost):

CREATE EXTENSION IF NOT EXISTS pgrouting; -- requires PostGIS -- edges table contract: id · source · target · cost · [reverse_cost] -- source/target are vertex IDs at each end of the edge (the road segment) -- cost = travel impedance (length, or seconds = length / speed) -- reverse_cost lets one row model a two-way street with direction-specific cost -- build vertices & populate source/target from geometry (modern API): SELECT pgr_extractVertices('SELECT id, geom FROM roads'); -- (pgr_createTopology is DEPRECATED since pgRouting 3.8 — prefer pgr_extractVertices) CREATE INDEX ON roads (source); CREATE INDEX ON roads (target); -- essential for speed

Use a metric CRS for cost. If cost is segment length, the geometry must be in metres (UTM/local grid), not degrees — otherwise impedances are nonsense. Transform the network once at build time.

Core algorithms

Function	Answers
`pgr_dijkstra(edges, start, end)`	Shortest path between two vertices
`pgr_dijkstra(edges, start, end[])`	One-to-many shortest paths (one load of the graph)
`pgr_drivingDistance(edges, start, max)`	All vertices reachable within a cost budget → isochrone
`pgr_dijkstraCostMatrix(edges, vids)`	All-pairs cost matrix (assignment/VRP input)
`pgr_aStar(edges, start, end)`	Dijkstra + geometry heuristic — faster on big road nets
`pgr_KSP(edges, start, end, k)`	K shortest paths (alternative routes if one is blocked)

Worked — fastest route, ambulance → incident

SELECT seq, node, edge, cost, agg_cost FROM pgr_dijkstra( 'SELECT id, source, target, cost, reverse_cost FROM roads', :ambulance_vertex, -- start vertex id :incident_vertex, -- end vertex id directed => true -- respect one-way streets ); -- join edge ids back to roads.geom to draw the route on the map / emit GeoJSON

Worked — 10-minute reachable area (isochrone)

-- which areas can a unit reach within 600 s? (cost column = seconds) SELECT dd.node, dd.agg_cost, v.geom FROM pgr_drivingDistance( 'SELECT id, source, target, cost FROM roads', :station_vertex, 600, directed => false) dd JOIN roads_vertices_pgr v ON v.id = dd.node; -- wrap the reachable nodes in ST_ConcaveHull for a coverage polygon

IDRM patterns. drivingDistance → coverage gaps (areas no unit reaches in time); dijkstraCostMatrix → assign each incident to its nearest available responder; KSP → reroute when the primary path crosses a flooded/blocked edge (drop those edges from the edges-SQL with a WHERE).

Version note. Verified against pgRouting 3.8 on PostGIS 3.6 / PostgreSQL 18. pgr_createTopology & pgr_createVerticesTable are deprecated since 3.8 — use pgr_extractVertices. Returned columns & signatures are stable across the 3.x series.

Graph over PostGIS · edges SQL {id,source,target,cost[,reverse_cost]} · Dijkstra/A*/drivingDistance/costMatrix · metric CRS for costs.docs.pgrouting.org · pgr_dijkstra, pgr_drivingDistance

PostgreSQL Master Reference

The SELECT Statement — Grammar & Execution Flow

Complete SELECT grammar (written order)

The execution pipeline — tables in, table out

Relational-algebra equivalent

Data Types — the Vocabulary

Numeric

Text

Temporal

Boolean, UUID & semi-structured

Casting & literals

Filtering, NULL Logic & CASE

Predicate toolkit

Three-valued logic

NULL traps & fixes

Conditional expressions

Functions & Operators

String

Date / time

Math & conversion

Aggregates

JOINs, Set Ops & Subqueries

Join grammar

ON vs USING vs NATURAL

Set operations

Subqueries — four shapes

GROUP BY vs Window Functions

GROUP BY — collapses (N → G)

WINDOW — preserves (N → N)

The OVER clause grammar

Window function catalog

CTEs & Recursion

Grammar

Plain CTE — readability

MATERIALIZED control

Recursive CTE — walking a hierarchy

Input: employees

Output: chain

Beyond SELECT: Write, Define, Control

DML — manipulate data

DDL — define structure

TCL — transactions

DCL — access

Indexes & EXPLAIN Performance

Index types — which when

Targeted index recipes

Why an index gets ignored

Reading EXPLAIN

PostGIS — Spatial Types & SRID

Enable & verify

The geometry type hierarchy

Dimensionality suffixes

Representations

SRID — the coordinate reference system

⚠ The defining gotcha: geometry ≠ geography

Defining a spatial table

PostGIS — Functions & Queries

Constructors

Accessors / inspectors

Spatial relationships — return boolean; most are index-aware

Measurement

Processing / output

Worked query — nearest shelters within 2 km

Worked query — spatial join (points-in-polygon count)

PostGIS — Spatial Indexing & Performance

Create the index — and the one fatal typo

The filter-and-refine model

Spatial operators

Index choice for spatial data

Confirm the index is used

PostGIS — Loading & Exporting Data

shp2pgsql — the Shapefile loader (ships with PostGIS)

ogr2ogr — the universal converter (GDAL; bundled with QGIS)

raster2pgsql — rasters

osm2pgsql — OpenStreetMap

Plain SQL & CSV — when you already have lon/lat columns

Exporting back out

PostGIS — QGIS Integration & Map Production

What QGIS requires of a layer

The metadata catalog QGIS reads