postgres=# PostgreSQL Master Reference · v18 + PostGIS 3.6 · 17 posters contents 1234 56789 1011121314151617 ⎙ PDF
postgres=#\i master_reference.sql

PostgreSQL Master Reference

SQL grammar in EBNF, the logical data-flow that explains every rule, the practical fluency to write correct queries, and a rigorous PostGIS spatial extension — a progressive learning arc, not a feature dump.

Verified against PostgreSQL 18 (stable 18.4) & PostGIS 3.6.2 · notation: [ ] optional · { } repeat · | choice

Posters 10–17 cover PostGIS end-to-end: types, queries, indexing, data loading, QGIS, validity, vector tiles & routing.

01SELECT — Grammar & Execution FlowFull query EBNF + the 8-stage pipeline behind alias & aggregate rules 02Data Typestext, numeric, temporal, boolean, jsonb, arrays, uuid + casting 03Filtering, NULL Logic & CASEWHERE precision, three-valued logic, conditional expressions 04Functions & OperatorsString, date/time, math, conversion & the aggregate catalog 05JOINs, Set Ops & SubqueriesEvery way to combine relations: horizontal, vertical, nested 06GROUP BY vs Window FunctionsCollapse vs preserve, with the OVER grammar 07CTEs & RecursionWITH for readability; RECURSIVE for hierarchies, worked end-to-end 08DML · DDL · TCL · DCLWriting rows, defining structure, transactions, access control 09Indexes & EXPLAIN PerformanceWhich index when, reading the plan, costly mistakes 10PostGIS — Spatial Types & SRIDSetup, geometry hierarchy, WKT/EWKT, the geometry-vs-geography trap 11PostGIS — Functions & QueriesConstructors, relationships, measurement, processing, output 12PostGIS — Spatial Indexing & PerformanceGiST/R-Tree, the && prefilter, filter-and-refine, KNN search 13PostGIS — Loading & Exporting Datashp2pgsql, ogr2ogr, raster2pgsql, osm2pgsql, \copy, pgsql2shp 14PostGIS — QGIS Integration & MapsLayer requirements, unique key, metadata views, EPSG, projections 15PostGIS — Validity & ProcessingST_MakeValid, buffers, overlays, simplify, dump/collect 16PostGIS — Vector Tiles & ClusteringST_AsMVT pipeline, DBSCAN/KMeans hotspots, web delivery 17pgRouting — Response LogisticsNetwork build, Dijkstra, driving-distance isochrones, cost matrix
postgres=#\h SELECT01 / 09

The SELECT Statement — Grammar & Execution Flow

SQL is written in one order and executed in another. The grammar says what's legal; the pipeline says why — why an alias works in ORDER BY but not WHERE, why aggregates are banned in WHERE but allowed in HAVING.

nonterminal keyword metasymbol PostgreSQL extension

Complete SELECT grammar (written order)

select_query = [ with_clause ] query_body [ ORDER BY sort_item {"," sort_item} ] [ limit_clause ] [ lock ] ";" ; query_body = query_term { ( UNION | EXCEPT ) [ ALL ] query_term } ; query_term = query_prim { INTERSECT [ ALL ] query_prim } ; query_prim = select_block | VALUES rows | TABLE name | "(" query_body ")" ; select_block = SELECT [ ALL | DISTINCT [ ON "(" expr {"," expr} ")" ] ] select_list [ FROM from_item {"," from_item} ] -- PG: FROM optional → SELECT now(); [ WHERE condition ] [ GROUP BY group_item {"," group_item} ] [ HAVING condition ] [ WINDOW name AS "(" window_def ")" ] ; limit_clause = LIMIT (n|ALL) [ OFFSET m ] | OFFSET m [ FETCH (FIRST|NEXT) n ROWS (ONLY|WITH TIES) ] ;

The execution pipeline — tables in, table out

1
FROM / JOINrelation ⋈
Assemble sources; multiple tables cross-join, then JOIN conditions apply. → T₁
2
WHEREselect σ
Keep rows where predicate is TRUE (not FALSE/NULL). No aggregates — groups don't exist yet. → T₂
3
GROUP BYgroup γ
Partition by key; collapse each group to one row. Cardinality drops. → T₃
4
HAVINGselect σ
Filter group rows; aggregates legal here because groups now exist. → T₄
5
SELECTproject π
Evaluate output expressions & window functions; assign aliases — result columns now exist. → T₅
6
DISTINCTdedup δ
Remove duplicate result rows. → T₆
7
ORDER BYsort τ
Impose order; aliases from step 5 visible here — the only place order is guaranteed. → T₇
8
LIMIT / OFFSETslice
Skip OFFSET, return at most LIMIT. Delivered to client. → RESULT
The alias rule, explained by the pipeline. WHERE (2) runs before SELECT (5), so a SELECT alias is invisible to WHERE/HAVING — repeat the full expression. ORDER BY (7) runs after SELECT, so it can use the alias.

Relational-algebra equivalent

SELECT dept, SUM(sales) AS total FROM s WHERE region='East' GROUP BY dept HAVING SUM(sales)>150 ORDER BY total DESC; τtotal↓ ( πdept,total ( σSUM>150 ( deptγSUM(sales)→total ( σregion='East' ( s ) ) ) ) ) sort project HAVING GROUP+aggregate WHERE source
Three PostgreSQL divergences.LIMIT/OFFSET aren't standard (portable: OFFSET … FETCH). ② DISTINCT ON (k) keeps the first row per key — PG-only. ③ FROM may be omitted entirely.
Execution order is logical; the planner may reorder physically but results match this model.postgresql.org/docs/18/sql-select.html
postgres=#\dT -- data types02 / 09

Data Types — the Vocabulary

Every column, expression and comparison has a type. Choosing well prevents silent bugs (float money, naive timestamps) and unlocks the right operators and indexes.

Numeric

TypeUse forNote
smallint/int/bigintWhole numbers2 / 4 / 8 bytes
numeric(p,s)Money, exactArbitrary precision, no rounding error
real / double precisionScientificFloat — never for currency
serial / bigserialAuto-incrementLegacy; prefer GENERATED … IDENTITY

Text

TypeUse forNote
textDefault choiceUnlimited; no perf penalty in PG
varchar(n)Length-cappedEnforces a max; otherwise = text
char(n)Fixed widthSpace-padded; rarely wanted
text and varchar perform identically in PG — reach for text unless you need a hard length limit.

Temporal

TypeStores
dateCalendar day, no time
time [tz]Time of day
timestampDate+time, no zone (naive)
timestamptzZone-aware — use this
intervalDuration ('2 days 3 hours')
Default to timestamptz. It stores UTC and converts on display; timestamp drops zone context and causes off-by-hours bugs.

Boolean, UUID & semi-structured

TypeValues / use
booleanTRUE / FALSE / NULL
uuid128-bit id; gen_random_uuid()
byteaRaw binary
enumCREATE TYPE mood AS ENUM(…)
jsonbIndexed JSON; -> ->> @>
type[]Array: int[], text[]
tsvector / int4rangeFull-text / range types

Casting & literals

SELECT '2026-06-29'::date, -- PG cast shorthand :: CAST('42' AS int), -- SQL-standard cast 3.0::numeric / 2, -- 1.5 (int/int truncates to 1) '{1,2,3}'::int[], -- array literal '{"a":1}'::jsonb ->> 'a'; -- → '1' (text)
Integer division truncates. 5/2 = 2. Force real division by casting one side: 5::numeric/2 = 2.5.
Type drives operators, indexes & comparison semantics — choose deliberately.postgresql.org/docs/18/datatype.html
postgres=#-- WHERE · NULL · CASE03 / 09

Filtering, NULL Logic & CASE

WHERE keeps a row only when its condition is TRUE. NULL turns Boolean logic from two values into three — the source of more silent bugs than any other SQL feature.

Predicate toolkit

PredicateMatchesExample
= <> < > <= >=Comparisonprice >= 100
BETWEEN a AND bInclusive rangeage BETWEEN 18 AND 65
IN (…)Any of a setstate IN ('AP','TS')
LIKE / ILIKEPattern (% _); ILIKE case-insensitivename ILIKE 'kal%'
~ / ~*POSIX regex / case-insensitivecode ~ '^[A-Z]{3}$'
IS [NOT] NULLNull testdeleted_at IS NULL
IS [NOT] DISTINCT FROMNULL-safe equalitya IS NOT DISTINCT FROM b

Three-valued logic

Any comparison with NULL yields UNKNOWN. A row passes WHERE only on TRUE.

ANDTFNULL
TTFNULL
FFFF
NULLNULLFNULL
ORTFNULL
TTTT
FTFNULL
NULLTNULLNULL

NULL traps & fixes

TrapFix
x = NULL never truex IS NULL
NOT IN (…,NULL) → 0 rowsFilter NULLs / NOT EXISTS
COUNT(col) skips NULLsCOUNT(*) for total rows
a = b false when both NULLIS NOT DISTINCT FROM
'x'||NULL → NULLCOALESCE(col,'')
UNIQUE allows many NULLs. NULL ≠ NULL, so duplicate NULLs pass a unique constraint unless you add NULLS NOT DISTINCT (PG 15+).

Conditional expressions

-- CASE: the if/else of SQL SELECT name, CASE WHEN score >= 90 THEN 'A' WHEN score >= 75 THEN 'B' ELSE 'C' END AS grade FROM students; -- NULL helpers COALESCE(a, b, 0) -- first non-NULL argument NULLIF(x, 0) -- NULL if x=0 (guard division) GREATEST(a,b,c) / LEAST(a,b,c) -- ignore NULLs
Conditional aggregation — pivot without a pivot: SUM(CASE WHEN region='East' THEN sales ELSE 0 END). Cleaner: SUM(sales) FILTER (WHERE region='East').
WHERE filters rows (pre-group); HAVING filters groups (post-group).postgresql.org/docs/18/functions-conditional.html
postgres=#\df -- functions04 / 09

Functions & Operators

The working toolkit, grouped by what you're transforming. Scalar functions act per row; aggregates collapse many rows into one.

String

FunctionResult
a || bConcatenate
length(s)Character count
lower / upper / initcapCase transforms
trim / ltrim / rtrimStrip whitespace
substring(s,from,len)Extract slice
replace(s,old,new)Substitute
split_part(s,delim,n)Nth field
position(sub in s)Index of substring
format('%s=%s',k,v)printf-style build
string_agg(x,',')aggregate: join rows

Date / time

FunctionResult
now() / current_dateCurrent moment / day
age(ts)Interval from now
date_trunc('month',ts)Round down to unit
extract(year from ts)Pull a field
ts + interval '1 day'Date arithmetic
to_char(ts,'YYYY-MM')Format to text
to_date(s,'DD/MM/YYYY')Parse from text
generate_series(a,b,step)Row-set of values

Math & conversion

FunctionResult
round(x,n) / truncRound / truncate
ceil / floor / absStandard math
mod(a,b) / a % bRemainder
power(a,b) / sqrtExponent / root
random()0 ≤ x < 1
x::type / cast(x as t)Convert type

Aggregates

FunctionResult
count(*) / count(col)All rows / non-NULL
sum / avg / min / maxStandard rollups
array_agg(x)Collect into array
string_agg(x,sep)Collect into string
jsonb_agg(x)Collect into JSON
bool_and / bool_orLogical rollup
percentile_cont(0.5) WITHIN GROUP (ORDER BY x)Median
FILTER beats CASE inside aggregates. Instead of SUM(CASE WHEN paid THEN amt END), write SUM(amt) FILTER (WHERE paid) — clearer and NULL-clean. Any aggregate accepts FILTER (WHERE …).
DISTINCT inside an aggregate. COUNT(DISTINCT user_id) counts unique values; plain COUNT(user_id) counts non-NULL occurrences — they differ whenever duplicates exist.
Scalar = per row · Aggregate = many → one · both composable with window OVER.postgresql.org/docs/18/functions.html
postgres=#\d -- combine relations05 / 09

JOINs, Set Ops & Subqueries

Three ways to combine relations: JOINs widen rows (more columns), set operations stack rows (same columns), subqueries nest one query inside another.

Join grammar

from_item = source { join } ; join = ( [INNER] | (LEFT|RIGHT|FULL) [OUTER] ) JOIN source ( ON cond | USING "("col…")" ) | CROSS JOIN source | NATURAL JOIN source | [LATERAL] ;
JoinKeepsUnmatched rows
INNERMatching pairs onlyDropped both sides
LEFT OUTERAll left + matchesLeft kept, right = NULL
RIGHT OUTERAll right + matchesRight kept, left = NULL
FULL OUTEREverythingEither side padded NULL
CROSSEvery combinationCartesian product
LATERALPer-row correlated subqueryRight may reference left's columns

ON vs USING vs NATURAL

ON a.id=b.id — explicit, both columns survive.
USING (id) — equal-named cols, merged to one.
NATURAL JOIN — auto-matches all shared names; convenient but fragile — a new shared column silently changes results.

Set operations

q1 UNION [ALL] q2 -- combine q1 INTERSECT q2 -- in both q1 EXCEPT q2 -- in q1 not q2

Same column count & compatible types. Default dedups; ALL keeps dups. INTERSECT binds tighter than UNION/EXCEPT.

Subqueries — four shapes

-- 1. Scalar: returns one value, usable anywhere a value fits SELECT name, salary, (SELECT avg(salary) FROM emp) AS company_avg FROM emp; -- 2. IN / ANY / ALL: membership against a column of values SELECT * FROM emp WHERE dept_id IN (SELECT id FROM dept WHERE region='East'); -- 3. EXISTS: correlated true/false test — stops at first match, NULL-safe SELECT * FROM dept d WHERE EXISTS (SELECT 1 FROM emp e WHERE e.dept_id = d.id); -- 4. Derived table: a subquery in FROM, aliased like a table SELECT region, max_sal FROM (SELECT region, max(salary) max_sal FROM emp GROUP BY region) t;
EXISTS over IN for NULL safety. NOT IN (subquery) returns zero rows if the subquery yields a single NULL. NOT EXISTS has no such trap and usually plans better for anti-joins.
JOIN = wider rows · SET = stacked rows · SUBQUERY = nested logic. Correlated subqueries re-run per outer row.postgresql.org/docs/18/queries-table-expressions.html
postgres=#-- collapse vs keep06 / 09

GROUP BY vs Window Functions

Same arithmetic, opposite shapes. GROUP BY folds many rows into one summary. A window function keeps every row and attaches a value computed over a window of related rows.

GROUP BY — collapses (N → G)

SELECT dept, SUM(sales) AS total FROM s GROUP BY dept;
depttotal
East400
West800

Employees gone — you can't also select employee; no single value per group.

WINDOW — preserves (N → N)

SELECT employee, dept, sales, SUM(sales) OVER (PARTITION BY dept) AS dt FROM s;
empdeptsalesdt
AshaEast100400
BinaEast300400
DeepaWest600800

Every row survives and knows its group total. Detail and aggregate together.

The OVER clause grammar

window_call = func "("[args]")" [ FILTER "("WHERE cond")" ] OVER ( window_def ) ; window_def = [ PARTITION BY expr… ] -- like GROUP BY, rows NOT collapsed [ ORDER BY sort… ] -- order for ranking & running totals [ frame ] ; frame = (ROWS|RANGE|GROUPS) BETWEEN start AND end ; start/end = UNBOUNDED PRECEDING | n PRECEDING | CURRENT ROW | n FOLLOWING | UNBOUNDED FOLLOWING ;

Window function catalog

FunctionReturnsNeeds ORDER BY
ROW_NUMBER()Unique sequential indexYes
RANK() / DENSE_RANK()Rank with / without gaps after tiesYes
LAG(x,n) / LEAD(x,n)Value n rows back / aheadYes
SUM(x) OVER(ORDER BY …)Running / cumulative totalYes
NTILE(k)Split into k bucketsYes
FIRST_VALUE / LAST_VALUE / NTH_VALUEValue at frame edge / positionUsually
Decision rule. One row per group (a report)? → GROUP BY. Every row plus a peer-computed value (rank, running total, share, diff-from-avg)? → window OVER (…). Windowing runs after grouping, so the two can coexist.
Math: GROUP BY = γ in bag algebra (many→one). Window = per-row map over an ordered context (one→one). Neither is in pure relational calculus.postgresql.org/docs/18/tutorial-window.html
postgres=#WITH RECURSIVE …07 / 09

CTEs & Recursion

A Common Table Expression names a subquery up front, so complex logic reads top-to-bottom instead of inside-out. RECURSIVE extends this to hierarchies and graphs — the standard way to walk a tree.

Grammar

with_clause = WITH [ RECURSIVE ] cte {"," cte} ; cte = name [ "("col…")" ] AS [ [NOT] MATERIALIZED ] "(" query ")" ;

Plain CTE — readability

WITH regional AS ( SELECT region, SUM(sales) total FROM s GROUP BY region ) SELECT * FROM regional WHERE total > 500 ORDER BY total DESC;

Name the intermediate result once, reuse it. Chains of CTEs read as sequential steps.

MATERIALIZED control

PG may inline a CTE into the outer query for speed. Force "compute once" with AS MATERIALIZED, or allow inlining with NOT MATERIALIZED. Use MATERIALIZED when the CTE is expensive and referenced multiple times.

Recursive CTE — walking a hierarchy

Two parts joined by UNION ALL: a base case (anchor) and a recursive case that references the CTE itself. PG iterates until the recursive part returns no new rows.

-- employees(id, name, manager_id) → everyone under Asha, with depth WITH RECURSIVE chain AS ( -- base case: the starting node SELECT id, name, manager_id, 1 AS depth FROM employees WHERE name = 'Asha' UNION ALL -- recursive case: children of rows already found SELECT e.id, e.name, e.manager_id, c.depth + 1 FROM employees e JOIN chain c ON e.manager_id = c.id ) SELECT repeat(' ', depth-1) || name AS tree, depth FROM chain ORDER BY depth;

Input: employees

idnamemgr
1Asha
2Bina1
3Chandu1
4Deepa2

Output: chain

treedepth
Asha1
  Bina2
  Chandu2
    Deepa3
Guard against infinite loops. On cyclic graphs an unguarded recursive CTE never terminates. PG 14+ offers UNION … CYCLE col SET is_cycle USING path; otherwise track a visited path array and filter.
Data-modifying CTEs (PG extension). WITH moved AS (DELETE FROM old WHERE … RETURNING *) INSERT INTO archive SELECT * FROM moved; — delete and re-insert atomically in one statement.
CTEs = top-down readability · RECURSIVE = anchor + UNION ALL self-reference until fixpoint.postgresql.org/docs/18/queries-with.html
postgres=#\h -- DML·DDL·TCL·DCL08 / 09

Beyond SELECT: Write, Define, Control

DML changes rows · DDL changes structure · TCL controls transactions · DCL controls access.

DML — manipulate data

insert = INSERT INTO table ["("col…")"] ( VALUES row… | select_query ) [ ON CONFLICT [target] ( DO NOTHING | DO UPDATE SET col"="EXCLUDED.col ) ] -- UPSERT [ RETURNING ("*"|expr…) ] ; -- RETURNING = PG ext update = UPDATE table SET col"="expr… [ FROM from_item ] [ WHERE cond ] [ RETURNING ] ; delete = DELETE FROM table [ USING from_item ] [ WHERE cond ] [ RETURNING ] ; merge = MERGE INTO tgt USING src ON cond { WHEN [NOT] MATCHED THEN (INSERT|UPDATE|DELETE) } ;
WHERE-less UPDATE/DELETE hits every row. Wrap risky writes in BEGIN; … ROLLBACK; to preview the row count before committing.

DDL — define structure

create_table = CREATE [TEMP|UNLOGGED] TABLE [IF NOT EXISTS] name "(" col type {constraint} {"," } ")" [ PARTITION BY (RANGE|LIST|HASH) ] ; constraint = PRIMARY KEY | UNIQUE [NULLS NOT DISTINCT] | NOT NULL | DEFAULT expr | CHECK "("cond")" | GENERATED ALWAYS AS IDENTITY | REFERENCES tbl [ON DELETE action] ; alter = ALTER TABLE name ( ADD|DROP|ALTER COLUMN | ADD CONSTRAINT | RENAME ) ; drop = DROP (TABLE|VIEW|INDEX) [IF EXISTS] name [CASCADE|RESTRICT] ;

TCL — transactions

BEGIN [ISOLATION LEVEL lvl]; SAVEPOINT sp; ROLLBACK TO sp; COMMIT; ROLLBACK;

Levels: READ COMMITTED (default) · REPEATABLE READ · SERIALIZABLE. ACID is per-transaction.

DCL — access

GRANT priv ON obj TO role; REVOKE priv ON obj FROM role;

priv = SELECT/INSERT/UPDATE/DELETE/ALL. Row-level security via CREATE POLICY.

RETURNING, ON CONFLICT, data-modifying CTEs & NULLS NOT DISTINCT are PostgreSQL extensions.postgresql.org/docs/18/sql-commands.html
postgres=#EXPLAIN ANALYZE …09 / 09

Indexes & EXPLAIN Performance

An index is a shortcut the planner may use. Knowing which index suits which query — and reading the plan to confirm it's used — is the difference between a 5ms and a 5s query.

Index types — which when

TypeBest forExample
B-tree (default)Equality & range on scalars; sortingWHERE price > 100 ORDER BY price
HashEquality onlyWHERE id = 42
GINMulti-value: jsonb, arrays, full-textWHERE tags @> '{sql}'
GiSTGeometric, ranges, nearest-neighbourWHERE geom && box
BRINHuge, naturally-ordered tables (time-series)WHERE created_at > …
SP-GiSTNon-balanced: quadtrees, IP rangesWHERE inet && '10.0.0.0/8'

Targeted index recipes

-- composite: order matters (leftmost prefix) CREATE INDEX ON orders (cust_id, created_at); -- partial: index only rows you query CREATE INDEX ON users (email) WHERE active; -- expression: match the WHERE expression CREATE INDEX ON users (lower(email)); -- covering: serve query from index alone CREATE INDEX ON orders (cust_id) INCLUDE (total);

Why an index gets ignored

Function on the column: WHERE lower(email)=… skips a plain email index — index the expression instead.
Leading wildcard: LIKE '%x' can't use B-tree.
Type mismatch forces a cast.
Tiny table: a seq scan is genuinely faster.
Stale stats: run ANALYZE.

Reading EXPLAIN

EXPLAIN [ ANALYZE ] [ BUFFERS ] [ FORMAT (TEXT|JSON) ] statement ; -- ANALYZE actually runs it, showing real rows + timing (use a transaction for writes). Index Scan using orders_cust_idx on orders (cost=0.42..8.4 rows=1 width=36) (actual time=0.018..0.020 rows=1 loops=1) ← actual ≈ estimate = healthy
Plan nodeMeaningConcern
Seq ScanRead whole tableBad on large tables w/ selective filter
Index Scan / Index Only ScanUse index; "Only" = no heap fetchGood
Bitmap Heap ScanMany matching rows via indexFine for medium selectivity
Nested LoopProbe inner per outer rowBlows up if outer rows large
Hash Join / Merge JoinBuild hash / merge sorted inputsGood for large joins
The estimate-vs-actual gap is the tell. When the planner expected rows=1 but got rows=50000, its choices rested on wrong stats — run ANALYZE, then match the index to the actual WHERE/JOIN/ORDER BY columns.
Version note. Verified on PostgreSQL 18 (stable 18.4, May 2026); v19 GA expected Sept 2026 — additive only. Always run the latest minor of your major version.
Index = optional shortcut · EXPLAIN ANALYZE = ground truth · match index to real query shape.postgresql.org/docs/18/using-explain.html
postgis=#CREATE EXTENSION postgis;10 / 12 · POSTGIS

PostGIS — Spatial Types & SRID

PostGIS turns PostgreSQL into a spatial database: new column types, ~300 ST_ functions, and spatial indexes. Everything starts with two decisions — which geometry type, and geometry vs geography.

spatial type function / keyword operator critical gotcha

Enable & verify

CREATE EXTENSION IF NOT EXISTS postgis; -- core: types, functions, GiST support CREATE EXTENSION postgis_raster; -- optional: raster / coverage CREATE EXTENSION postgis_topology; -- optional: topological model SELECT postgis_full_version(); -- → POSTGIS 3.6.2, GEOS 3.14, PROJ 8.2 …

The geometry type hierarchy

TypeIsWKT example
POINTOne locationPOINT(77.6 14.4)
LINESTRINGOrdered verticesLINESTRING(0 0, 1 1, 2 1)
POLYGONRing(s); first = outerPOLYGON((0 0,4 0,4 4,0 4,0 0))
MULTIPOINTMany pointsMULTIPOINT((0 0),(1 1))
MULTILINESTRINGMany lines
MULTIPOLYGONMany polygons (islands)
GEOMETRYCOLLECTIONMixed bagGEOMETRYCOLLECTION(POINT(0 0),…)

Dimensionality suffixes

Append Z (elevation), M (measure), or ZM: POINTZ, LINESTRINGM, POINTZM.

Representations

FormMeaning
WKTWell-Known Text (human-readable)
WKBWell-Known Binary (storage/wire)
EWKT/EWKBPostGIS extended: embeds SRID + Z/M
SRID=4326;POINT(77.6 14.4)

SRID — the coordinate reference system

Every geometry carries an SRID (Spatial Reference ID) declaring how its numbers map to the Earth. Stored in the spatial_ref_sys table. The two you'll meet constantly:

SRIDSystemUnitsUse
4326WGS 84 lon/latdegreesGPS, global storage, web data
3857Web Mercatormetres (projected)Tile maps (Google/OSM/Leaflet)
326xx/327xxUTM zone N/SmetresAccurate regional measurement
Mixed SRIDs error out. ST_Distance(a,b) on SRID 4326 vs SRID 0 raises Operation on mixed SRID geometries. Reproject one with ST_Transform(geom, 3857) first. Use ST_SetSRID only to label an unlabelled geometry — it does not reproject.

⚠ The defining gotcha: geometry ≠ geography

 geometrygeography
MathFlat / Cartesian (planar)Curved / geodesic (spheroid)
UnitsWhatever the SRID says — 4326 ⇒ degreesAlways metres
SpeedFastSlower (great-circle math)
Function libraryFull (~300 funcs)Subset: ST_Distance, ST_DWithin, ST_Area, ST_Length, ST_Intersects, ST_Covers…
Best forProjected/local data; heavy processingGlobal lon/lat, correct distances out of the box
SRID 4326 on a geometry column is the #1 production bug. ST_Distance on two geometry(Point,4326) values returns a number in degrees — a "1 km" radius search silently matches points 111 km away. Fix: store as geography(Point,4326), or cast per-query: ST_Distance(a::geography, b::geography) → metres. A degree of longitude is ~111 km at the equator but shrinks to 0 at the poles, so degree "distance" is meaningless.

Defining a spatial table

-- typed, constrained columns (recommended over bare 'geometry') CREATE TABLE shelters ( id bigint GENERATED ALWAYS AS IDENTITY PRIMARY KEY, name text NOT NULL, geom geometry(Point, 4326), -- planar; great for tile maps after transform location geography(Point, 4326) -- geodesic; metres for nearby-search ); -- registered metadata view (auto-maintained): SELECT f_table_name, f_geometry_column, srid, type FROM geometry_columns;
PostGIS 3.6.2 (Feb 2026) · pairs with PostgreSQL 18, GEOS 3.14, PROJ 8.2 · OGC Simple Features compliant.postgis.net/docs · workshops/postgis-intro
postgis=#\df ST_* -- spatial functions11 / 12 · POSTGIS

PostGIS — Functions & Queries

Nearly all spatial functions are prefixed ST_ (Spatial Type, from the OGC standard). Grouped by purpose: construct geometries, inspect them, test relationships, measure, process, and output.

Constructors

FunctionBuilds geometry from
ST_MakePoint(x,y[,z])Raw coordinates (no SRID!)
ST_Point(x,y,srid)Coordinates + SRID
ST_GeomFromText(wkt,srid)WKT string
ST_GeogFromText(ewkt)Text → geography
ST_GeomFromGeoJSON(json)GeoJSON
ST_SetSRID(geom,srid)Labels SRID (no reproject)
ST_MakePoint returns SRID 0. Always wrap: ST_SetSRID(ST_MakePoint(lon,lat),4326).

Accessors / inspectors

FunctionReturns
ST_X(pt) / ST_Y(pt)Coordinate (geometry only)
ST_GeometryType(g)ST_Point
ST_SRID(g)The SRID
ST_NPoints(g)Vertex count
ST_IsValid(g)OGC validity (self-intersect?)
ST_AsText(g) / ST_AsEWKT(g)Back to WKT / EWKT
Many functions assume valid geometry. Repair with ST_MakeValid(geom) before area/overlay operations.

Spatial relationships — return boolean; most are index-aware

FunctionTrue when
ST_Intersects(a,b)They share any point (most common test)
ST_Contains(a,b) / ST_Within(a,b)a fully contains b / b within a
ST_Covers(a,b) / ST_CoveredByLike contains, boundary-inclusive (preferred)
ST_DWithin(a,b,d)Within distance d (metres for geography) — index-aware
ST_Touches / ST_Crosses / ST_OverlapsBoundary-only / partial / same-dim overlap
ST_Disjoint(a,b)Share no point (not index-aware — negation)

Measurement

FunctionReturns
ST_Distance(a,b)Min distance (SRID units / m for geog)
ST_Length(line)Length
ST_Area(poly)Area
ST_Perimeter(poly)Boundary length
ST_Azimuth(a,b)Bearing in radians

Processing / output

FunctionProduces
ST_Buffer(g,d)Zone within d (geometry)
ST_Transform(g,srid)Reproject to new CRS
ST_Union(g) / ST_IntersectionMerge / overlap
ST_Centroid / ST_EnvelopeCenter / bounding box
ST_Simplify(g,tol)Fewer vertices
ST_AsGeoJSON / ST_AsMVTWeb output / vector tiles

Worked query — nearest shelters within 2 km

-- geography column → distance in METRES, ST_DWithin uses the spatial index SELECT name, ST_Distance(location, ref.g) AS metres FROM shelters, (SELECT ST_SetSRID(ST_MakePoint(79.99, 14.44), 4326)::geography AS g) ref WHERE ST_DWithin(location, ref.g, 2000) -- 2000 m; prefiltered by GiST index ORDER BY location <-> ref.g -- KNN operator: true index-ordered nearest-first LIMIT 5;

Worked query — spatial join (points-in-polygon count)

-- "how many shelters in each district?" ST_Contains is index-aware SELECT d.name, count(s.*) AS n FROM districts d LEFT JOIN shelters s ON ST_Contains(d.geom, s.geom) GROUP BY d.name ORDER BY n DESC;
Constructor SRID discipline. The three most common "why is my result wrong" causes: forgetting SRID on ST_MakePoint (defaults to 0), storing lon/lat in geometry instead of geography, and passing lat/lon reversed — PostGIS expects (longitude, latitude), i.e. (X, Y).
~300 ST_ functions; geography supports a deliberate subset. Index-aware predicates auto-add a bounding-box prefilter.postgis.net/docs/reference.html
postgis=#CREATE INDEX … USING GIST12 / 12 · POSTGIS

PostGIS — Spatial Indexing & Performance

A spatial index can't index irregular shapes directly, so it indexes each geometry's bounding box in a GiST R-Tree. Queries then run in two phases: a fast index filter, then an exact refine.

Create the index — and the one fatal typo

CREATE INDEX shelters_geom_gix ON shelters USING GIST (geom); ANALYZE shelters; -- refresh planner stats so the index actually gets chosen -- geography indexes identically; GiST handles the sphere/dateline/poles correctly CREATE INDEX shelters_geog_gix ON shelters USING GIST (location);
Omit USING GIST and you get a B-tree. A B-tree tries to index the whole geometry and errors with index row requires N bytes, maximum size is 8191 — and even when it builds, it can't answer spatial queries. The USING GIST clause is mandatory for spatial columns.

The filter-and-refine model

1
Index filterbounding box
GiST R-Tree returns rows whose bounding boxes overlap the query box. Fast, approximate, may include false positives. → candidates
2
Exact refinetrue geometry
The exact predicate (e.g. ST_Intersects) runs only on candidates, confirming real matches. → result

Index-aware functions (ST_Intersects, ST_Contains, ST_Covers, ST_DWithin, ST_Within …) inject phase 1 automatically. ST_DWithin internally expands the bounding box by the distance and applies && on both sides — which is why it's vastly faster than filtering on ST_Distance(...) < d.

Spatial operators

OpMeaningUse
&&Bounding boxes overlap/touch (2-D)Manual index prefilter; pair with exact test
<->Distance between geometries (KNN)ORDER BY geom <-> pt — index-ordered nearest-neighbour
<#>Distance between bounding boxes (KNN)Approximate nearest, cheapest
~ / @Box contains / contained byPure box containment
Why ST_Relate & ST_Distance stay slow. They are not index-aware — no automatic && prefilter. For a bounding-box search without an exact predicate, add && yourself: WHERE a.geom && b.geom. For nearest-neighbour, prefer the <-> KNN operator over sorting by ST_Distance.

Index choice for spatial data

IndexWhen
GiSTDefault. R-Tree over bounding boxes; handles all geometry & geography
SP-GiSTPoint-heavy data with strong spatial clustering (space-partitioned)
BRINHuge tables already physically sorted by location; tiny index, weaker filtering — poor for trajectories

Confirm the index is used

EXPLAIN ANALYZE SELECT name FROM shelters WHERE ST_DWithin(location, :pt::geography, 2000); -- want to see: Index Scan using shelters_geog_gix (not Seq Scan)
If you still see Seq Scan: run ANALYZE (stale stats), check the function is index-aware, confirm both sides share one SRID/type, and ensure you didn't wrap the column in a function (ST_Transform(geom,…) in the WHERE defeats the index — index the transformed expression instead, or store a second projected column).
GiST R-Tree on bounding boxes · filter-then-refine · index-aware predicates auto-prefilter · <-> for true KNN. PostGIS 3.6.2.postgis.net/workshops/postgis-intro/indexing.html
postgis=#shp2pgsql · ogr2ogr · raster2pgsql13 / 17 · POSTGIS

PostGIS — Loading & Exporting Data

Spatial data arrives as Shapefiles, GeoJSON, GeoPackage, rasters, or OSM extracts. These command-line loaders convert each into PostGIS rows. Every workflow — QGIS, IDRM ingest, ETL — starts here.

shp2pgsql — the Shapefile loader (ships with PostGIS)

# emits SQL; pipe straight into psql. Flags matter: shp2pgsql -I -D -s 4326 -g geom roads.shp public.roads | psql -d gisdb # -s 4326 source SRID (REQUIRED — wrong/missing SRID is the usual first bug) # -I build a GiST spatial index after load # -D dump format (much faster than INSERTs for big files) # -g geom name the geometry column 'geom' (QGIS-friendly) # mode: -c create (default) · -a append · -d drop+create · -p prepare-only # -G load as geography instead of geometry · -W LATIN1 fix encoding

ogr2ogr — the universal converter (GDAL; bundled with QGIS)

# reads ~100 formats. Destination FIRST, then source: ogr2ogr -f "PostgreSQL" \ PG:"host=localhost dbname=gisdb user=gis password=••• " \ shelters.geojson \ -nln shelters # new layer (table) name \ -nlt PROMOTE_TO_MULTI # Polygon → MultiPolygon (mixed-type safety) \ -lco GEOMETRY_NAME=geom # geometry column name \ -lco FID=gid # primary-key column (QGIS needs one) \ -t_srs EPSG:4326 # reproject on the way in
QGIS-ready defaults. Always set GEOMETRY_NAME=geom and FID=gid on import — QGIS needs a single integer key and looks for a conventionally named geometry column. PROMOTE_TO_MULTI avoids "found Polygon, expected MultiPolygon" failures on mixed Shapefiles.

raster2pgsql — rasters

raster2pgsql -I -C -s 4326 \ -t 256x256 -F \ flood_depth.tif public.flood | psql -d gisdb # -I index · -C constraints (registers in # raster_columns) · -t tile size # -s SRID · -F add filename column # -R register out-of-db (keep file on disk)

Tiling (-t) is essential — untiled rasters force whole-file reads. For IDRM flood/hazard layers, COG + -R keeps storage lean.

osm2pgsql — OpenStreetMap

osm2pgsql --create --database gisdb \ --slim -G andhra.osm.pbf # → planet_osm_point / _line # / _roads / _polygon tables

Base layer for basemaps & routing. Pair with osm2pgrouting to build a pgRouting network (poster 17).

Plain SQL & CSV — when you already have lon/lat columns

-- load a CSV of coordinates, then build geometry from the columns \copy raw_points(name, lon, lat) FROM 'incidents.csv' WITH (FORMAT csv, HEADER); ALTER TABLE raw_points ADD COLUMN geom geometry(Point, 4326); UPDATE raw_points SET geom = ST_SetSRID(ST_MakePoint(lon, lat), 4326); -- (lon, lat) order!

Exporting back out

TargetCommand
Shapefilepgsql2shp -f out.shp gisdb "SELECT * FROM roads"
GeoJSON / GPKG / anyogr2ogr -f GPKG out.gpkg PG:"…" -sql "SELECT …"
Inline GeoJSONSELECT ST_AsGeoJSON(geom) FROM roads;
shp2pgsql & raster2pgsql ship with PostGIS · ogr2ogr/osm2pgsql via GDAL/QGIS · always pipe to psql, always set SRID.postgis.net/workshops/postgis-intro/loading_data.html
postgis=#-- QGIS layer + map production14 / 17 · POSTGIS

PostGIS — QGIS Integration & Map Production

QGIS reads PostGIS tables directly as live layers. Three things make a table (or view) load cleanly: a registered geometry column, a declared SRID, and exactly one unique integer key.

What QGIS requires of a layer

RequirementWhyHow
Single unique keyQGIS needs a stable feature id to page & editA bigint PK; for views add one (below)
Typmod geometry columnSo the SRID & type register correctlygeometry(Point,4326), not bare geometry
Declared SRIDReproject-on-the-fly & correct placementST_SetSRID / typed column / ST_Transform
Spatial indexPan/zoom stays fast on big layersCREATE INDEX … USING GIST (geom)
Views need a manufactured key. A view has no primary key, so QGIS may refuse it or pick the wrong column. Add a synthetic unique id and pick it in the "Add PostGIS layer" dialog: ROW_NUMBER() OVER () AS qgis_id. Also cast the geometry to typmod form in the view (geom::geometry(Point,4326)) so it registers in geometry_columns.

The metadata catalog QGIS reads

ObjectHolds
spatial_ref_sysEvery SRID: srid, auth_name (EPSG), srtext, proj4text
geometry_columnsView: table, column, coord_dimension, srid, type for each geometry
geography_columnsSame, for geography columns (auto-maintained)
-- inspect what QGIS will see SELECT f_table_name, f_geometry_column, srid, type FROM geometry_columns; -- find an EPSG code by name (e.g. UTM 44N for Andhra Pradesh) SELECT srid, auth_name, srtext FROM spatial_ref_sys WHERE srtext ILIKE '%UTM zone 44N%'; -- confirm a column's SRID SELECT Find_SRID('public', 'shelters', 'geom');

Common projections for an Indian deployment

EPSGSystemUse
4326WGS 84 lon/latStorage, GPS, exchange
3857Web MercatorWeb/tile basemaps
32644WGS 84 / UTM 44NMetric measurement, most of AP/Telangana
7755WGS 84 / India zone (NIANG)National grid

Dual-projection pattern (store once, map two ways)

-- keep a metric copy alongside lon/lat so QGIS can measure AND web-map ALTER TABLE shelters ADD COLUMN geom_utm geometry(Point, 32644); UPDATE shelters SET geom_utm = ST_Transform(geom, 32644); CREATE INDEX ON shelters USING GIST (geom_utm); -- QGIS shows both columns as choosable layers
Import path tip. In QGIS, prefer Database → DB Manager → Import or "Export to PostgreSQL" over drag-and-drop — both use COPY under the hood (far faster than row-by-row INSERT), and let you set the SRID, primary key, and geometry column name at load time.
Registered geometry column + one unique key + declared SRID = a layer QGIS loads, styles, edits & reprojects live.postgis.net · QGIS PostGIS data provider
postgis=#ST_MakeValid · ST_Buffer · ST_Union15 / 17 · POSTGIS

PostGIS — Validity & Geometry Processing

PostGIS follows the OGC rule that most functions assume valid, simple geometry. Real-world admin boundaries, flood polygons and digitized data routinely violate that — so validate first, then process.

Validity — inspect & repair

FunctionReturns / does
ST_IsValid(g)boolean — is it OGC-valid?
ST_IsValidReason(g)Text: e.g. "Ring Self-intersection at …"
ST_IsValidDetails(g)Reason + offending location
ST_MakeValid(g)Repairs in place (accepts invalid input)
ST_IsSimple(g)No self-intersection (lines/points)
-- find and fix invalid polygons in one pass, see what was wrong UPDATE districts SET geom = ST_MakeValid(geom) WHERE NOT ST_IsValid(geom) RETURNING name, ST_IsValidReason(geom); -- enforce validity going forward with a CHECK constraint ALTER TABLE districts ADD CONSTRAINT geom_valid CHECK (ST_IsValid(geom));
Why it matters. ST_Area, ST_Intersection, ST_Union and overlays give wrong answers or errors on invalid input. PostGIS does not validate on load (too slow), so an explicit ST_MakeValid step belongs in every ingest pipeline.

Processing & analysis toolkit

FunctionProduces
ST_Buffer(g,d)Zone within distance d (e.g. evacuation radius)
ST_Union(g)Dissolve/merge many into one
ST_Intersection(a,b)Overlapping portion
ST_Difference(a,b)a minus b
ST_ConvexHull(g)Tightest enclosing convex polygon
FunctionProduces
ST_Simplify(g,tol)Fewer vertices (may break topology)
ST_SimplifyPreserveTopologySimplify without self-intersecting
ST_Collect(g)Bundle into Multi* / collection (agg)
ST_Dump(g)Explode collection → one row per part
ST_Centroid / ST_PointOnSurfaceCenter / guaranteed-inside point

Worked example — 5 km evacuation zone, dissolved

-- buffer each hazard point 5 km (metric SRID!), merge overlaps, count affected shelters WITH zone AS ( SELECT ST_Union(ST_Buffer(ST_Transform(geom, 32644), 5000)) AS g -- 5000 m in UTM 44N FROM hazards WHERE kind = 'flood' ) SELECT count(*) AS shelters_at_risk FROM shelters s, zone WHERE ST_Intersects(ST_Transform(s.geom, 32644), zone.g);
Buffer in metres, not degrees. ST_Buffer on a 4326 geometry buffers by degrees. Transform to a metric CRS (UTM/local grid) first, or use geography. Same rule as the distance trap on poster 10.
Validate → repair → process. ST_MakeValid before any overlay; ST_SimplifyPreserveTopology before any tiling.postgis.net/docs · ST_MakeValid, ST_IsValidReason
postgis=#ST_AsMVT · ST_ClusterDBSCAN16 / 17 · POSTGIS

PostGIS — Vector Tiles & Clustering

Serve maps to the browser as Mapbox Vector Tiles generated in-database, and reduce thousands of incident points to meaningful clusters — both are single-query operations PostGIS does natively.

The canonical vector-tile pipeline

A tile server converts a {z}/{x}/{y} request into one query. Three functions cooperate: ST_TileEnvelope builds the tile's bounds, ST_AsMVTGeom clips & quantizes geometry into tile space, ST_AsMVT aggregates rows into the binary tile.

WITH mvtgeom AS ( SELECT ST_AsMVTGeom( ST_Transform(geom, 3857), -- tile space is Web Mercator ST_TileEnvelope(:z, :x, :y), -- bounds for this tile extent => 4096, buffer => 64 -- standard grid + edge buffer ) AS geom, name, kind -- attributes ride along into the tile FROM shelters WHERE geom && ST_Transform( -- && = index prefilter to this tile ST_TileEnvelope(:z, :x, :y, margin => 64.0/4096), 4326) ) SELECT ST_AsMVT(mvtgeom.*) FROM mvtgeom; -- → bytea, send as application/x-protobuf
Why each piece exists. The && filter uses the GiST index so only this tile's rows are read. ST_AsMVTGeom clips to the tile, fixes validity, and rescales to the 0–4096 integer grid. The buffer avoids clipped labels/lines at tile edges. Wrap as a z/x/y endpoint and any MapLibre/Leaflet client renders it.
Composite (multi-layer) tiles. UNION several ST_AsMVTGeom sub-selects, each tagged with a layer name passed as ST_AsMVT(rows.*, 'layername') — one request returns roads + shelters + hazards together.

Clustering — turn point swarms into insight

FunctionKindUse for
ST_ClusterDBSCAN(geom, eps, minpts)Window fnDensity hotspots; eps in CRS units, noise = NULL cluster
ST_ClusterKMeans(geom, k)Window fnPartition into exactly k groups (e.g. k zones)
ST_ClusterWithin(geom, d)AggregateGroup geometries within distance d
ST_ClusterIntersecting(geom)AggregateConnected/touching sets
-- incident hotspots: DBSCAN, 250 m radius, ≥5 incidents to form a cluster SELECT id, kind, ST_ClusterDBSCAN(ST_Transform(geom, 32644), eps => 250, minpoints => 5) OVER () AS cluster_id -- window fn: every row keeps its cluster label FROM incidents; -- then summarize each cluster into a hull + count for the map SELECT cluster_id, count(*) AS n, ST_ConvexHull(ST_Collect(geom)) AS footprint FROM clustered WHERE cluster_id IS NOT NULL GROUP BY cluster_id;
DBSCAN vs K-Means. DBSCAN finds natural dense groups, leaves outliers unclustered (NULL), and needs no preset count — ideal for "where are incidents concentrating?". K-Means forces every point into exactly k clusters — use when you must split into a fixed number of response zones.
ST_TileEnvelope → ST_AsMVTGeom → ST_AsMVT = tiles from SQL · DBSCAN/KMeans are window functions over geometry. PostGIS 3.6.postgis.net/docs · ST_AsMVT, ST_ClusterDBSCAN
postgis=#CREATE EXTENSION pgrouting;17 / 17 · PGROUTING

pgRouting — Response Logistics

pgRouting adds graph algorithms on top of PostGIS: shortest path, reachable area, cost matrices. The disaster-response capstone — route responders, compute coverage, assign the nearest unit.

The network model

Every algorithm runs over an edges SQL result exposing four columns (plus optional reverse cost):

CREATE EXTENSION IF NOT EXISTS pgrouting; -- requires PostGIS -- edges table contract: id · source · target · cost · [reverse_cost] -- source/target are vertex IDs at each end of the edge (the road segment) -- cost = travel impedance (length, or seconds = length / speed) -- reverse_cost lets one row model a two-way street with direction-specific cost -- build vertices & populate source/target from geometry (modern API): SELECT pgr_extractVertices('SELECT id, geom FROM roads'); -- (pgr_createTopology is DEPRECATED since pgRouting 3.8 — prefer pgr_extractVertices) CREATE INDEX ON roads (source); CREATE INDEX ON roads (target); -- essential for speed
Use a metric CRS for cost. If cost is segment length, the geometry must be in metres (UTM/local grid), not degrees — otherwise impedances are nonsense. Transform the network once at build time.

Core algorithms

FunctionAnswers
pgr_dijkstra(edges, start, end)Shortest path between two vertices
pgr_dijkstra(edges, start, end[])One-to-many shortest paths (one load of the graph)
pgr_drivingDistance(edges, start, max)All vertices reachable within a cost budget → isochrone
pgr_dijkstraCostMatrix(edges, vids)All-pairs cost matrix (assignment/VRP input)
pgr_aStar(edges, start, end)Dijkstra + geometry heuristic — faster on big road nets
pgr_KSP(edges, start, end, k)K shortest paths (alternative routes if one is blocked)

Worked — fastest route, ambulance → incident

SELECT seq, node, edge, cost, agg_cost FROM pgr_dijkstra( 'SELECT id, source, target, cost, reverse_cost FROM roads', :ambulance_vertex, -- start vertex id :incident_vertex, -- end vertex id directed => true -- respect one-way streets ); -- join edge ids back to roads.geom to draw the route on the map / emit GeoJSON

Worked — 10-minute reachable area (isochrone)

-- which areas can a unit reach within 600 s? (cost column = seconds) SELECT dd.node, dd.agg_cost, v.geom FROM pgr_drivingDistance( 'SELECT id, source, target, cost FROM roads', :station_vertex, 600, directed => false) dd JOIN roads_vertices_pgr v ON v.id = dd.node; -- wrap the reachable nodes in ST_ConcaveHull for a coverage polygon
IDRM patterns. drivingDistance → coverage gaps (areas no unit reaches in time); dijkstraCostMatrix → assign each incident to its nearest available responder; KSP → reroute when the primary path crosses a flooded/blocked edge (drop those edges from the edges-SQL with a WHERE).
Version note. Verified against pgRouting 3.8 on PostGIS 3.6 / PostgreSQL 18. pgr_createTopology & pgr_createVerticesTable are deprecated since 3.8 — use pgr_extractVertices. Returned columns & signatures are stable across the 3.x series.
Graph over PostGIS · edges SQL {id,source,target,cost[,reverse_cost]} · Dijkstra/A*/drivingDistance/costMatrix · metric CRS for costs.docs.pgrouting.org · pgr_dijkstra, pgr_drivingDistance